Tay, Twitter Bots, and the Value Alignment Problem

Recently, Microsoft launched a bot on Twitter that learns to speak from anyone who speaks to it. The results were disastrous on multiple levels:

Trolls turned Tay, Microsoft’s fun millennial AI bot, into a genocidal maniac

First, let’s look at some of the many reasons why it’s a bad thing this happened:

  • On a purely business level, this is a PR disaster for Microsoft. In the present-day culture of instant outrage, this was the perfect news story. The headline “Microsoft Builds Racist Robot” is a guaranteed clickthrough. It makes Microsoft look either evil, negligent, or incompetent.
  • On a user experience level, this bot makes wide swaths of the population feel excluded and attacked. That’s simply bad UX.
  • On an infosec level, it has wide-open attack vectors. The most obvious one is you can get it to tweet anything if you use the following phrase: “repeat after me.” It’s the most obvious injection attack vector possible.
  • On a social level, hate speech already has enough of a platform. This bot was turned into an amplifier for the most deplorable parts of humanity.
    • Even if it were just some pranksters from 4Chan messing with the bot as a joke, it has the unintended side-effect of making visible hateful, fringe viewpoints beyond their proportional representation in society.
    • It was promoted by Microsoft, one of the largest corporate presences in the world by any measure.
    • It was on Twitter — a platform not only with millions of users, but one that’s closely watched by every news agency and blog to be amplified to audiences worldwide.

As bots and other self-sustaining agents become more prevalent in day-to-day life, they absolutely need to deal with these issues.

Why did this happen? How can we avoid this?

For something more clear-cut, let’s take a look at a similar snafu that happened with Google Photos last summer:

Google Photos Mistakenly Labels Black People ‘Gorillas’

This algorithm did not have the knowledge of context and history and racial issues that a human would have. It was simply working with a collection of training data and statistical models. In essence, it was matching new input with its knowledge of old input and producing the most probable output. As with all statistical modeling, it has some error rate, and just like sometimes, it would mislabel a chair as a stool, it mislabeled this input as well.

It’s tempting to say that algorithms are neutral.

They are not.

Machine learning algorithms, by definition, are biased. They have to be. If they were neutral, they would have no better results than flipping a coin. They have to have bias built into them. What builds this bias into the statistical models? Training data and those who design the algorithm. And, as much as we in the software industry would like to believe otherwise, both of those things have complicated relationships with the real world.

Data is not a perfect representation of the real world. A dataset is highly dependent on the choices, conscious and unconscious, made by those who collected the data. If your training data seems comprehensive (e.g. every photo indexed by Google Image Search), that’s when you have to be careful.

How do you know you have enough photos of dark-skinned people to be able to distinguish them from animals that they’ve been historically associated with as a means to oppression and dehumanization? If your test data is equally biased, you can’t, until it blows up out in the real world when a real-life dark-skinned person tries to use it. This is especially true if your team of computer scientists, data scientists, and software engineers are full of people who don’t have experience with these issues socially. That brings us back to Tay.

Here is a breakdown of Tay’s failures in the context of a larger culture where these issues are generally not visible:

The Ongoing Lessons of Tay

Particularly, take a look at this quote:

A long time ago, I observed that there are hundreds of NLP papers on sentiment classification, and less than a dozen on automatically identifying online harassment. This is how the NLP community has chosen to prioritize its goals. I believe we are all complicit in this, and I am embarrassed and ashamed.

This is a consequence of the free market. There is a business demand for sentiment analysis tools (to classify customer reviews of products as positive or negative, for example), but no demand for anti-harassment technology. Research with an immediate business impact is prioritized over research with long-term social and business (PR) consequences. The skeptical response is: “Why is this bad in the long run? Why not let the free market take care of it? If ethical algorithm design becomes a priority, it will automatically become prioritized.”

I’m not convinced this is true.

This line of thinking follows the ideology of utilitarian ethics, which has many problems of its own. For example, take a look at this article. You can justify a lot of morally unsound behavior and decisions with utilitarianism.

Another reason we should not always let market forces rule public goods (like society’s body of research and publically available algorithms) is because it is a short-sighted force of nature. As humans, we should have more of an interest in our long-term survival. Here are some situations where the free market has, is, and will fail us:

  • environmental concerns
  • sustainable energy usage for the long term
  • market bubbles and crashes, ruining individual lives
  • child labor
  • investment in space travel for us to become a multi-planetary species to reduce the chances of annihilation

The free market has worked mostly well for us until now. However, the lack of focus on the long term is troubling especially now that we live in such an abstract, accelerating world. Each individual has far-reaching powers unimaginable to anyone even half a century ago. We are inching ever-closer to creating algorithms that have significant impact on our day-to-day lives. This brings us to the Value Alignment Problem.

Here is the Arbital page for Value Alignment Problem. In essence here is what it is: How do we design systems (particularly self-sufficient software systems such as AGI) that has motivation to do its best to help humanity? How do we align its values with values possessed by the best of our species (for a well-thought-out definition of “best”)?

In the far (but not too far) future, this issue will suddenly become an emergency if not dealt with now. The Machine Intelligence Research Institute (MIRI) is starting to tackle some of these problems, but the free market is not.

The free market is not set up to deal with issues like the Value Alignment Problem. It needs to be solved by forces outside the market. Government is the most obvious candidate, but a government run by the governed often has trouble solving large, abstract problems. Maybe we need more organizations like MIRI. Maybe we need more individuals willing to get involved in civic hacking even as just a hobby. I don’t know what the solution is but I do know the market will have nothing to do with it until it’s too late.

Let’s get back to Tay. What should the Tay team have done differently?

Tay is a relatively simple Twitter bot. Twitter already has a tight-knit, conscientious community of botmakers, all of whom already deal with ethical questions pretty well. The easiest thing in the world for Microsoft to do would have been looking into prior art before creating a Twitter bot. Here is an article containing interviews with some of the more prominent botmakers:

How to Make a Bot That Isn’t Racist

Microsoft’s engineers failed to do their due diligence before launching Tay, and this failing points to much larger issues that we are all about to face.


Feature Hashing, or the “hashing trick”

Feature hashing, or the “hashing trick,” is a clever method of dimensionality reduction that uses some of the important aspects of a good hash function to do some otherwise heavy lifting in NLP. This is a good blog post with the fundamentals of how and why the hashing trick works when working with a large, sparse set of vectors:

Hashing Language

Feature hashing is an elegant solution to the otherwise hairy problem of fighting the curse of dimensionality. It turned out to be extremely useful for a project I’m currently working on for a course at Columbia: Computational Models of Social Meaning.

Scikit-Learn has an implementation of the hashing trick if you’d like to read more about it.

A hidden gem in Manning and Schutze: what to call 4+-grams?


I’m a longtime fan of Chris Manning and Hinrich Schutze’s “Foundations of Natural Language Processing” — I’ve learned from it, I’ve taught from it, and I still find myself thumbing through it from time to time. Last week, I wrote a blog post on SXSW titles that involved looking at n-grams of different lengths, including unigrams, bigrams, trigrams and … well, what do we call the next one up? Manning and Schutze devoted an entire paragraph to it on page 193 which I absolutely love and thought would be fun to share for those who haven’t seen it.

Before continuing with model-building, let us pause for a brief interlude on naming. The cases of n-gram language models that people usually use are for n=2,3,4, and these alternatives are usually referred to as a bigram, a trigram, and a four-gram model, respectively. Revealing this will surely be enough to…

View original post 235 more words

Detecting Social Power in Written Dialog

This semester, I’ll be working on a project at Columbia CCLS with Prof. Owen Rambow and Vinod Prabhakaran. It falls under the broad category of discourse analysis. Specifically, we’ll be looking into detecting displays of power from email threads and discussion boards. Here’s some prior work that explains the subject better:

Written Dialog and Social Power: Manifestations of Different Types of Power in Dialog Behavior

Extracting social meaning from text analysis is an interesting subject, and I’m excited to get started on it.

Filler words and function words

Today, I found this interesting article on NPR:

Our Use Of Little Words Can, Uh, Reveal Hidden Interests

Here’s a short excerpt:

“When two people are paying close attention, they use language in the same way,” he says. “And it’s one of these things that humans do automatically.”

Pennebaker has counted words to better understand lots of things. He’s looked at lying, at leadership, at who will recover from trauma.”

Here is Prof. Pennebaker’s web page, discussing some of the details of his findings:

The World of Words

An excerpt:

Style-related words can also reveal basic social and personality processes, including:

  • Lying vs telling the truth. When people tell the truth, they are more like to use 1st person singular pronouns. They also use more exclusive words like except, but, without, excluding. Words such as this indicate that a person is making a distinction between what they did do and what they didn’t do. Liars have a problem with such complex ideas.
  • Dominance in a conversation. Analyze the relative use of the word “I” between two speakers in an interaction. Usually, the higher status speaker will use fewer “I” words.
  • Social bonding after a trauma. In the days and weeks after a cultural upheaval, people become more self-less (less use of “I”) and more oriented towards others (increased use of “we”).
  • Depression and suicide-proneness. Public figures speaking in press conferenecs and published poets in their poetry use more 1st person singular when they are depressed or prone to suicide.
  • Testosterone levels. In two case studies, it was found that when people’s testosterone levels increased rapidly, they dropped in their use of references to other people.
  • Basic self-reported personality dimensions. Multiple studies are now showing that style-related words do much better than chance at distinguishing people who are high or low in the Big Five dimensions of personality: neuroticism, extraversion, openness, agreeableness, and conscientiousness.
  • Consumer patterns. By knowing people’s linguistic styles, we are able to predict (at reasonable rates), their music and radio station preference, liking for various consumer goods, car preferences, etc.
  • And much, much more.

And finally, here’s a link to the paper published in The Journal of Language and Social Psychology:

Um . . . Who Like Says You Know: Filler Word Use as a Function of Age, Gender, and Personality

I find it fascinating that they were able to extract this information without using any complicated analysis of syntax, as far as I can tell.

I played with the free-to-use, public version of LIWC. It seems this gives you some results of the analysis, without drawing any conclusions from it. I fed it the “I Have A Dream” speech by Martin Luther King, Jr. Here were my results:

Details of Writer: 34 year old Male
Date/Time: 1 September 2014, 2:43 pm

LIWC Dimension Your
Self-references (I, me, my) 4.08 11.4 4.2
Social words 6.58 9.5 8.0
Positive emotions 3.74 2.7 2.6
Negative emotions 0.79 2.6 1.6
Overall cognitive words 2.27 7.8 5.4
Articles (a, an, the) 8.50 5.0 7.2
Big words (> 6 letters) 18.03 13.1 19.6

The text you submitted was 882 words in length.

The numbers don’t have units, so I’m not sure how I’m supposed to interpret them. Nonetheless, it’s interesting to compare the contents of the speech to “personal” and “formal” texts in relative terms, I suppose.

I looked around on the Internet, and found a Reddit comment referring to the work of Fairclough, Van Dijk and Wodak.

Here’s one article about Critical Discourse Analysis, the category this type of study falls under: Teun A. Van Dijk – Critical Discourse Analysis. From that article:

Critical analysis of conversation is very different from an analysis of news reports in the press or of lessons and teaching at school.

This might be a reason why I Have A Dream might not have been a good example to use to play with the LIWC tool.