For my final in Quantified Humanists I have been interested in exploring the data layers of the world as they are perceived by machines.
My project aims to raise my awareness of what machines see, through a minimally invasive display. For a long time I have been interested in computer vision, machine learning and AI (in the complete science fiction sense where humans and machines are coexisting). There is a lot of opinions on this topic, from the dystopia of the singularity to utopian futures where our senses are enhanced by machines (you can read more about the decision process that brought me to this topic here).
I am deeply fascinated by the utopia/dystopia scenario and the many possible futures that could happen, however there is really no way I can know how to feel about it from just reading or talking or imagining it. The only way is to begin experimenting with feeling it on my own body. I decided that it would be a great topic for my Quantified Humanists final – what would it be like to live with an AI, to teach an AI? What kind of data might I give it, what kind of experiences and feelings might I have in return?
The project is LIVE here: https://qh-ml-test-app.herokuapp.com/
It runs best in the Chrome browser on your computer (remember sound). It has been tested to run in the Safari browser on iPhone 8 (unfortunately no sound yet).
The visualization of the data I have gathered is live here: https://qh-ml-test-app.herokuapp.com/brain.
What is Tomodachi?
Tomodachi (which means Friend in Japanese), is the name I have given the project. I imagine it as sort of a friend you bring with you. You teach it things. It teaches you things. You treat each other nicely and learn from each other.
Tomodachi is a website that runs a machine learning algorithm on whatever I point my phone camera at. Whenever Tomo is more then 70 % sure about having recognized an object, it speaks the name of the object out loud and adds it to a list. Thus accumulating a list of objects for that particular location and time.
I then rate how correct/accurate I think Tomo was, and how it made me feel. I have based the logged feelings loosely on Plutchik’s wheel of emotion since this is a great tool to begin to understand how I am feeling about something. I can adjust these, as I begin to log more data, and figure out which feelings are more present than others. I have already done so and split “Joy” into “Happy” and “Laugh” because I have observed that Tomo doesn’t make me happy (makes my day better) but very often makes me laugh – most often because it is horrible inaccurate or inappropriate (almost like it is telling a joke). It is quite funny when it sees things on people or objects that aren’t there. Read more about his under findings.
How does it work?
Tomo is implemented in Express (following this tutorial) and p5.js. It is hosted on Heroku. You have to log in to submit data, and to see the visualization, so I am the only one who can see it.
Tomo uses a readily available machine learning algorithm in js to obtain information about objects in the world: https://ml5js.org/docs/ImageClassifier. The code I wrote checks whether the object was already seen before on the current location, and if not, it adds it to the list of objects.
Tomo then reads the data out loud to me using the p5 speech library: http://ability.nyu.edu/p5.js-speech/ The library doesn’t work for mobile yet, but runs perfectly in Chrome on the pc.
Tomo adds contextual descriptors : location + date + time using the p5 geoLocation library https://github.com/bmoren/p5.geolocation
The feeling descriptors: correct detection + my feeling about it are sliders in p5.
In the future, as Machine Learning development permits, I want Tomo to be able to do a lot more for me. Like a personal Siri. But to answer my questions about what it is like to live with and train my own personal AI, this is a great starting point.
Tomodachi has been live for 8 days now, and these are a collection of my findings:
I laughed very hard when a sizable labrador was called chihuahua by Tomo. I did not expect to laugh this much. It has come to a point where if it is wrong I almost feel like I have been told a joke.
I have had my awareness expanded around what is possible to do with machine learning currently. I have found this to be an interesting way to get familiar with what a training data set must look like.
I did not expect to learn so many new words. I have to look up words sometimes because I don’t know what Tomo has seen, and then I cannot rate how well it did.
I have found that if I treat Tomo as a person, I think of it as a person. It is a bit like Iron Man naming his computer Jarvis and the way he talks to it…
Even if Tomo is inaccurate in the data collection, you can get an idea of what I was doing (see below).
I am still very much working on the visualization of the data, since I want it to feel like you are stepping into Tomo’s brain and trying to see what it is seeing… based on the location, time, and words.
I am using google street view as a context to visualize the location, with the words currently as a list in the top and the feelings as a bar chart. I like the aesthetic direction of AR overlay on the real world, but I have some work to do before it becomes pretty.
In the first you get an idea of a summer time activity: The hat (which Tomo couldn’t decide was a Sombrero or Cowboy hat – it was a straw hat), the sunscreen (which wasn’t there but could have been!), the beer bottle (was a wine bottle). The cleaver is just creepy. The ocarina is one of Tomo’s favorites, it sees Ocarinas everywhere. From this info, it can actually be surmised that I probably was at a picnic.
In the second: A stove, a bookcase, a soap dispenser (all correct), a salt shaker (coat rack) and a waffle iron (that was actually a blanket but I can see the similarity in pattern). From this info, it can actually be surmised that I am indoors in this space. And that it is probably a living space.
I will continue to work on the data visualization part for the next two weeks, since I am using it in my Open Source Cinema class. I will add more than just words, so Tomo has representations of the words it has seen, either as 3D models or as a generative painting.
I will continue beyond this, as I add more capabilities to the computer vision algorithm, and more data is gathered. Firstly I need to be gathering many more data points in this way, before I can decide what to add next.
However one limitation that I do want to work on is that Tomo can only see objects, I want it to be able to see people and recognize expressions. It is rude when it labels hair a “wig” almost every single time.
I will also add in functionality around the most common words that Tomo gets wrong, so that it will learn what they are not supposed to look like. Possible by taking images of the things it gets wrong. Tomo sees cleavers (after looking up that word I find this very creepy!) everywhere, and salt shakers (not so creepy, but when I am showing it a coat rack, that is very wrong). I will begin to dive into the training data of the particular ML algorithms I am using and see if I can understand why it gets it wrong.