Darrell Huff's 1954 book, How to Lie with Statistics, is a classic introduction to statistics. It's very informal; I first read it when I was a teenager. I remember one passage in part because I had no idea what what it was about:
For further evidence go back to 1936 and the Literary Digest's famed fiasco. The ten million telephone and Digest subscribers who assured the editors of the doomed magazine that it would be Landon 370, Roosevelt 161 came from the list that had accurately predicted the 1932 election. How could there be bias in a list already so tested?I can't go back to 1936 in my memory--it was 30 years before I was born--but I can look up what the doomed magazine published on October 31, 1936.
Well, the great battle of the ballots in the poll of 10 million voters, scattered throughout the forty-eight states of the Union, is now finished, and in the table below we record the figures received up to the hour of going to press.Their bottom line (literally) was this:
In studying the table of the voters from of the States printed below, please remember that we make no claims at this time for their absolute accuracy.
The text of the article is more circumspect than later accounts have it. The Literary Digest published the numbers but, as above, was careful to say that they might not be "accurate", though today we'd probably say "predictive;" we can assume that all their data were carefully processed. Still, the table gives Landon a strong lead over Roosevelt, 57.1% to 42.9% in the popular vote, and it indicates that Landon would take 370 out of 531 electoral votes. (If you're not familiar with American politics, it's the electoral vote that counts, but the popular vote typically goes to the winner as well.)
Roosevelt won the election, taking more than 60% of the popular vote and all but 8 electoral votes. Of the 48 states at the time, Landon won only Vermont and Maine, two small New England states. What went wrong? You can probably guess. But I'll defer to Huff:
There was a bias, of course, as college theses and other post mortems found: People who could afford telephones and magazine subscriptions in 1936 were not a cross section of voters. Economically they were a special kind of people, a sample biased because it was loaded with what turned out to be Republican voters.In the 1930s the U.S. was still in the Great Depression; the unemployment rate peaked mid-decade above 20%. Sampling only Literary Digest subscribers and telephone owners produced an obvious skew toward the better-off, and these were Republicans. Oddly enough, the latter observation about (land line) telephone owners still holds true today. Younger voters skew Democratic (63% voted for the Democratic candidate in 2012), and they are also more likely to use a cell phone rather than a land line. They'd be missed--and indeed were missed--by telephone polls that only go to land lines.
Back to HCI. What can my students learn from knowing all this? Mainly it's this: The importance of representative users. I let students choose their own projects for the course; they submit brief descriptions of their plans to build a system (the front-end, at least) and what they expect it to do. Sometimes students will come up with creative and appealing ideas--an application aimed at helping young children learn the multiplication tables; a memory aid to assist elderly users in taking their medications; a universal design for a given task that accommodates users with vision, hearing, or motor impairments--but they don't always have a plan for working people who would actually use their systems in real life. (Sometimes they do, which is exciting.) I always emphasize the importance of usability testing, and even in a classroom environment it should be possible to build user interfaces that can be evaluated by real users.