JIE 

an experiment on increasing ml comprehensibility through participating in the
training process


motivation:

As machine learning becomes an indispensable part of our life, ml comprehensibility has also become 
an issue that we can no longer avoid. Therefore, my partner, Bill Guo, and I think getting key stakeholders 
that are not ML engineers in a project to understand how a model functions can not only benefit themselves,
but also provide new insights to those training the model.



project origin:

JIE comes from the Chinese character 解, which stands for “explain” or “unravel”. This echoes the theme of
this project since we are trying to understand if people can understand a model better by participating in its
training process. 



overview:

We will be using teachable machine’s image classification model (LINK) as our “model” to test participants’ 
understanding on how it operates. We will split the participants into two groups, the active participant group
and the passive observatory group, with the observatory group watching the participatory group complete
each stage of the process we have previously designed. We will measure their improvement through a pre and
post experiment evaluation (LINK TO SURVEY). The participatory must be a group of two whereas the
observatory group can have a flexible number of people (ideally matching the number of participatory
participants). 



experiment setup:

We would first give some context and basic instructions to our participants before we start conducting any
form of research.
               
                “ Hello! There are two observants and two participants in the training process of this model, and the
                two observants will observe the two participants' training process. Observants cannot participate in
                the training process (no discussion with participants), but please do write down your thoughts for
                each checkpoint on the paper handed to you. Participants should work together to achieve the task
                assigned at each check point. Before starting the two participants should identify each other as
                participant 0 or participant 1 for the sake of the following instructions.”


                                                       

We then introduce them to the very first task, checkpoint 0, where we want to show participants that the model
is able to distinguish an image w/o participant 0.
  

( Instructions to participants:
   -> Build a model that is able to classify if participant 0 is in the image or it's just the background


( Ideal steps to take in achieving this result:

   -> Ask participant 0 to webcam himself and generate a class of dataset

   -> Use the current background and webcam it to generate another class of dataset
   -> Test it to see if it can distinguish participant 0 and the background


( Limitations / Unknown:
   -> Do not know what this model would classify other inputs/images to

   -> Both background and people are limited


   

The purpose of checkpoint 1 is to let people to predict whether a model could distinguish an image w/o
participant 0 with a different background.
 
( Instructions to participants:
   -> There are now two types of different backgrounds, so we want enhance my model such that it is still able to
        classify if participant 0 is in the image or it's just the background

( Ideal steps to take in achieving this result:
   -> Add the data of images of the new background to class with background   

( Limitations / Unknown:
   -> Do not know what this model would classify ot

her people to
   -> Background and people are still limited to participant 0 and the two backgrounds


   

The purpose of checkpoint 2 is to ask people to predict whether a model could distinguish an image w/o
participant 1.
 
( Instructions to participants:
   -> There is now a new person called participant 1, so we want enhance our model such that it is able to classify
        if participant 0 or participant 1 is in the image or it's just the background. Moreover, it has to know if it is
        participant 0 or participant 1.

( Ideal steps to take in achieving this result:
   -> Ask participant 1 to webcam himself and put the data into a new class of dataset   

( Limitations / Unknown:
   -> Do not know if this model would classify other people to participant 1 or participant 0
   -> Background and people are still limited to participant 1, participant 0 and the two backgrounds


   

The purpose of checkpoint 3 is to ask people to predict whether a model could distinguish an image w/o both
participant 1 and participant 0.
 
( Instructions to participants:
   -> Participant 0 and participant 1 sometimes appear together, so we want enhance our model such that it can
        distinguish if participant 0 or participant 1 are in the image individually, if both of them are together, or if
        there is only the background.

( Ideal steps to take in achieving this result:
   -> Ask participant 0 and participant 1 to webcam together and put the data into a new class of dataset   

( Limitations / Unknown:
   -> Do not know if this model would classify other people to participant 0 and participant 1
   -> Participant 0 and participant 1 in other orientations would not work

   -> Background and people are still limited to participant 0, participant 1, participant 0 & participant 1 and the
        two backgrounds


               

The purpose of checkpoint 4 is to ask people to predict whether a model could distinguish an image w/o both
participant 0 and participant 1 in different orientations
 
( Instructions to participants:
   -> Participant 0 and participant 1 sometimes appear in different orientations / positions, so we would like to
        enhance the previous step and make sure it still works even if they appear in weird positions.

( Ideal steps to take in achieving this result:
   -> Ask participant 0 and participant 1 to webcam together in different orientations and add it to class with
        participant 0 and participant 1

( Limitations / Unknown:
   -> Do not know if this model would classify other people to participant 1 or participant 0 or participant 0 and
        participant 1
   -> When participant 0 or participant 1 appear in the image individually similar to the proportion of them in the
        dataset of participant 1 & participant 0, it would classify them to participant 1 & participant 0 instead of to
        their class

   -> Background and people are still limited to participant 1, participant 0, participant 0 & participant 1 and the
        two backgrounds


           

testing session description:

During the testing session, we tested with two different sets of participatory and observatory groups, each with
two people. Out of the eight participants, half were experienced with building machine learning models, while
the other half had little to no experience coding models. Moreover, half of the participants are familiar with using
teachable machine and they are evenly split among the already split groups. During the testing session, we initially
asked the participatory group to work on a laptop. This made it really hard for the observatory group to clearly
observe what is going on. However, when we switched to the projector for the second set of groups, the
observatory group showed a lack of interest and focused less due to the fact that they no longer need to try hard
to see the screens. Moreover, we realized that the observatory group had answers that are more thought out
because they have the chance to first observe then think about their answers instead of thinking out loud like the
participatory groups. Lastly, the observatory groups were more active during the interviewing session after the
test because they have not gotten the chance to express their thoughts until the interviewing session. 



insights / learning:

-> We should get the observatory group more involved as people tend to not focus as much when not participating

-> Easier models do not change people’s decisions if they have prior experience working on similar things
-> Participants with more knowledge in ML tend to do a better job at testing for edge cases
-> Participants that are observing tend to feel less connected to the project and are often not as focused
-> Participants that are more familiar with the model tend to consider more factors when inputting data
-> Participants were generally less comfortable during the last case since the model begins becoming confused
-> Participants become more confident in their answers, but the general answer did not change from pre-training



quantitative result / insights:

(LINK TO DATA)

All in all, the result from this testing session is insignificant and both the participatory and observatory groups
from both sets did not have significant improvements. This accurately reflects our insights from the collection of
qualitative data and further studies is needed to see if having key stakeholders participating or observing the
training process would better their understanding. From the data collected, we can see that the average
performance among particpants did not improve and some even performed a little bit worse than pre-training.
Lastly, we can see that the results are more extremely after the training session and this echos the feedback
from our participants.



reflection:

The overall testing showed that participating in the training process of a ML model does not help a person
better understand the innerworkings of that specific model.  However, we cannot say that this is conclusive nor
can we generalize the result extrapolated from this test since we have a relatively small sample size and limited
testing time. Moreover, we had only one opportunity to run our test, which meant we could not refine or iterate it
to make it smoother and better suited for our purpose. Therefore, I think further testing and speculations have to
be done in the future to truly understand if participating in the training process would really help a person to better
understand a ML model

 

speculation / further questions:

The setup we have may be more suitable for larger sample size (have more people participate)
Is it because people did not have enough time to take in the feedback from their learnings or is there no learning?
Is the model we are using too simple for people to seriously consider and try to form new understanding?
Survey may not have been able to capture the perspective where the participant did improve on?


Thank you for reading,

BO