ZHENG 

a test on non-intrusive proof-of-humanhood methods


motivation:

As sybil attacks becomes more and more prevalent in the web3 space, finding a non-intrusive way of proving
the humanhood of a specific account becomes an important issue. The two existing solutions that I am aware
of require either your retina data or a video of yourself. This defeats the purpose of anonymity and having
different accounts for different purposes should definitely be supported. Moreover, having a more liquid way
of proving humanhood means we are not attached to a specific account, which means we can transfer our
proofs even when our account is compromised.



project origin:

ZHENG comes from the Chinese character 证, which stands for “prove”. This echoes the theme of this project
since we are trying to understand the best non-intrusive ways to prove a person’s humanhood.



ideal system:

Building on the premise that humans are best at identifying other humans, we want to create a purely on-chain
solution that tackles the humanhood verification / sybil attack problem through DAO governance. Instead of
solving the proof of the unique humanhood issue, we want to use proofs that are more liquid and transferable
between accounts. Moreover, we believe having multiple accounts for different purposes should be allowed
for privacy/anonymous purposes as long as they do not exceed a certain amount that can challenge the safety
of a network. We also would want the system to be friendly to new users in the blockchain space. Lastly, we want
to make sure the governing body can be self-sustainable and reward those who participate in this system.


project setup / overview:

Even though the ideal system seems within reach, one of the core issue is to setup a system that is both friendly
to new users and also can efficiently distinguish bots in a non-intrusive way. Therefore this project seeks to find
the answers by mix and matching human testers with a conversation robot called replika (LINK) to see if they can 
distinguish the difference between a human and a robot. Moreover, we hope to extract further insights by 
observing the behavior of human testers to see how we can incorporate that into future system.



core value:

The very core of this project is to view/design a model that can successfully filter out the bots from human
without any intrusive measures. Moreover, it should be friendly to new users since it is contradictory to build a
scalable system that is not compatible with on boarding new users. We note that there are existing answers to
this issue such as captchas. However, as captach tools become more prevalent, it is harder to only use captcha
as a bot tester especially during situations where the award of having bots is very lucrative. Therefore, the core 
purpose of this experiment is to extract how humans determine if another being they are interacting with is also
human and try to apply the abstraction onto future solutions. These solutions must be resilient against the
current machine learning models since they would be otherwise very exploitable.
 
 

key questions / answers:

( how do we prevent intrusiveness?
   avoid PERSONAL questions   
-> NEED to see the extent of questions asked by testers
-> NEED feedback from testers at the end of the experiment

( how do we measure humaness?
   the core question to this project
-> NEED to observe tester’s behavior
-> NEED to ask users’ reaonings behind their questions
-> NEED to extract higher level observations from interactions

( how do we offer new user friendly onboarding?
   assuming they do not have prior track records in web3
 -> NEED user basic questions that any human can answer
 


experiment setup:

I first trained my replika conversation robot by taking 15 minutes everyday to talk to them. After four days of
conversing with my replika, I made two different accounts on discord and had four different people to participate
in my experiment.
               
I then told each participant about the goal of this experiement, and scheduled four different time slots with each of
them. They would either be conversing with one of the another three humans or with me on the other side giving
their input into the replika robot. They were asked to not ask questions related to identities and privacy simply
because that defeats the purpose of a non-intrusive model.

                                                        


testing session description:

Each scheduled session with a tester lasts five minutes and all four participants have experience interacting with
anti-bot tools. Moreover, all four have experience in machine learning, which means that they are aware of the NLP
behind replika. The very first test I ran had an immediate issue because my reply time was slower than the average
reply time of a human. Therefore, the tester asked about this issue right away. This resulted in a rescheduling as
well as asking each partcipant to reply after a designated time so response time would not be a determining factor.
The rest of the testing session went on smoothly and only had hiccups when the bot was not reacting to the human
in an effective manner. This is suppose to happen since we are limit testing the bots and trying to see when it breaks.



observations:

-> Case 1: Repetition
    During two of the ten tests ran, particpants tried to gauge the humaness of the entity they are interacting with by
    repeating a question they are asking in different format, and the response they got from more human perspective
    is asking why they are asking the questions again. On the other hand, the bot would respond in a very serious
    manner and reply in similar or exact same ways. This was one of the main ways people began questioning if the
    account they are interacting with is actually human.

-> Case 2: Randomness
    All conversation between either human to human or bot to human had randomness thrown into the mix. This is
    because most believe that randomness in questions can throw the bot off and eventually exploit things they
    were no programmed to be prepared for. Moreover, this is less predictable, which is easy to humans but can be
    extremely difficult to bots. As a result, with enough efforts to randomize the topic, the human eventually figured
    out the bots with this method. 

-> Case 3: Implications/Instant learning
    An interesting observation is when a person is sure they are talking to another person, they will start to talk in code
    or memes that do not match their literal meanings. This was interesting to see because some humans could also not
    catch some of these hints and had to ask for explanation or googled on the spot (googling is allowed since bots cnanot
    really google and extract information) . 


quantitative result / insights:

All in all, the result from this testing session is quite significant as every single participant was able to eventually
identify the bot, and while some had to spend longer time identifying everyone finished within five minutes when they
talked to the bot. Identifying human took longer because people wanted to be sure that they are not making a mistake by
categorizing a robot into a human. From this, we can see that our hypothesis of human being the best at recognizing
humans is true and our observation tells us that repetition, randomness, and learning on the spot can be extremely
helpful in providing an inclusive way to identify humans.



reflection:

The overall testing showed that we can utilize the three observations we have made from above to center future 
verification task we require people to do. I think this was an extremely successful experiment since I am quite happy
about the fact that I could extract the general weaknesses of bots and find patterns in how humans interacts with each
other. Therefore, I think it is a great start to making a robust system in filtering out bots to prevent on-chain sybil attacks.

 

speculation / further questions:

What bot testing tasks can I apply the observation I have made on? 
Since humans are extremely good at doing this, can we have people making money proving other people’s humaness?
How can we ensure people do not game the system in low effort way such as copy pasting answers?
Is doing everything in a unscalable way first reasonable?


Thank you for reading,

BO