Wednesday, May 23, 2012

I'll bet you like ice cream.



Do you like ice cream? I'll predict that if you care enough to mention ice cream via Twitter, you're probably in favor of it, even moderately enthusiastic.

Am I just guessing? Not entirely. I used a visualization system developed by my colleague Chris Healey to produce the result at the top of this post. When I typed in "ice cream", the system retrieved a few hundred recent tweets containing that term and generated a visualization of their emotional content, or sentiment. Try it yourself, with your own keywords, on Chris's tweet viz Web page.

The visualization integrates a lot of information (details here), but I'll concentrate on the basics: roughly speaking, the green circles are for tweets that express "positive" sentiment, and the blue circles are for tweets expressing "negative" sentiment. Sentiment is inferred from the words in the tweet. For example, "ice cream and good sex!" (the text of an actual tweet, with relevant words in bold) contains "good" and "sex". Very nice. But how could ice cream be bad? Well, some people who eat ice cream when they're feeling down might tweet about it. And one tweet says it's "dangerous even in an ice cream shop! Robbery yesterday..."

We'll need a bit of theory to understand how the circles are laid out. James Russell, a psychologist at Boston College, has proposed a conceptual framework for understanding emotion [PDF]. (There have been many attempts to formalize what we know about emotion.) In Russell's framework, one important factor is valence, which ranges from unpleasant to pleasant. (This is actually what I meant by "negative" and "positive" above.) The horizontal placement of a circle is an indication of the unpleasantness or pleasantness expressed in a tweet. And the vertical placement? That's another factor in Russell's framework: arousal, which ranges from being nearly comatose to being very, very excited. So the circles toward the top are excited, with happy tweets on the right and stressed-out tweets on the left, and at the bottom they're kind of... meh. We don't see a lot of the latter; tweeting takes some effort, after all.

It's surprisingly hard to find topics where tweets contain uniformly pleasant or unpleasant sentiment. Not even for the keyword "funeral":


Some people apparently like funerals! But if you hovered your mouse pointer over different green circles to read the tweets, you'd find that some Twitter users have the word "funeral" in their names, and sometimes they tweet about happy things. (Why not filter out names? We'd lose information. For example, a query on "obama" might then ignore tweets containing only @BarackObama, which we probably do want to see.) Also, you'll occasionally find something along these lines: "Nice to see my loving family at an event other than a funeral." A loving family is a pleasant thing to have.


This last example suggests that an automated analysis isn't as smart as a person in extracting meaning or sentiment. A human being, for example, might reasonably judge that a tweet about "an event other than a funeral" isn't really about funerals. Processing natural language, beyond the level of individual words, remains a very hard problem.


I'm writing about Chris's tweet viz system for a couple of reasons. First, it's cool. Half a billion people across the world use Twitter, and 340 million tweets are posted each day. (I'm quoting one of my students, Shishir Kakaraddi, who just completed an M.S. thesis in the general area of tweet summarization.) We need good tools for making sense of all this data. Second, the project is a nice example of how research can drive software development. It's not just about what people might like to see in a visualization of Twitter data; the design of the visualization draws on psychological models of visual perception (for example, in the color choices) and of emotion (in the analysis and display of sentiment information). By building and testing systems like these, we can learn new and interesting things.


No comments:

Post a Comment