Evaluating prediction market success in the 2008 election (…or, why wrong is right)
Monday, November 17th, 2008There is a huge mis-conception in the media when it comes to evaluating the success of prediction markets in the recent US election. Simply put, you have to be wrong to be right. But depending on the type of prediction market you are operating, different methods of assessment are required. I’m going to go through each type of market and compare it to the best-of-class of poll aggregators (fivethirtyeight.com), discuss what they came up with, and potentially find a potential winner from the 2008 forecasts.
Probabilistic predictions (and prediction markets)
The most popular prediction markets this election season were probabilistic markets, where the payoff was either 1 or 0. (In the case of InTrade, $10 or $0.) The market price can and is interpreted as a percent probability that the contract will take place; one of the candidates getting elected. Unfortunately, many commentators believe that once a contract is above 50% that the candidate will win, and if a candidate is above 50% in a given market and doesn’t win, the prediction market has failed.
This is completely wrong.
I’ve discussed in previous posts that you have to be wrong in order to be right. As a quick reminder:
- 98% chance = 1 in 50 will be wrong
- 90% chance = 1 in 10 will be wrong
- 80% chance = 1 in 5 will be wrong
- 75% chance = 1 in 4 will be wrong
- 67% chance = 1 in 3 will be wrong
- 50% chance = 1 in 2 will be wrong
So how did the prediction markets do? And how did fivethirtyeight.com do in comparison? (They’re the only site I know of that takes poll results and turns them into probabilistic forecasts mathematically.) Here’s a quick snapshot:
InTrade results

We immediately run into the primary problem with evaluating these markets… while there are 50+ markets, only a small number of these were not at the extremes of the scales. I’ve just included those with a market price of <90%.
This looks great for InTrade, but it's deceiving. For example, let's take a look at all contracts at approximately 80%. (FL, OH, VA, NV, and CO). They have an average market price of 82.6%. If InTrade was perfectly calibrated, one of these should have been wrong.
It looks better on the other side. If you calculate MT, GA, ND, and IN they have an average market price of 26.75%. In fact, one of these four were wrong (Indiana).
For the mathematical amongst you, it should be clear that this type of assessment is quite crude. When you have so few data points, you need to pick and choose "bins" of data with your own best judgement. Ideally there would be enough markets that you could create strict "bins" of data and measure against those. (This is exactly how Inkling created their plots here… I would encourage you to read their post, too.) Unfortunately, we only have a small number of “battleground” markets and this just isn’t possible.
Fivethirtyeight.com results

In my opinion, fivethirtyeight.com appears to have done marginally better. Take the bottom five markets shown, which have a combined probability of 15%. In fact, one of these five actually occurred; a 20% success rate.
Another slice of data shows the same thing: MT, ND, IN, and MO have an average probability of 30% and an actual success rate of 25%. Taking MO, NC and FL the probability was 61%, and and actual success rate of 67%.
I say that fivethirtyeight.com has done marginally better because I could take reasonable slices of data throughout their predictions and come up with reasonably calibrated results. I had to specifically pick and choose to find similar results with InTrade. In other words, the fivethirtyeight.com forecasts were more internally consistent.
Non-probability predictions (and prediction markets)
Some of the lesser-cited and more lightly-traded prediction markets for the 2008 election cycle were on index markets on the vote share for each candidate. Here was the final tally of the national vote:
- Democrat – Obama – 52.7%
- Republican – McCain – 46.0%
- Other – 1.3%
What did the Iowa Electronic Market forecast? (Data from midnight on the 3rd)
- Democrat – Obama – 53.5% (0.8% error)
- Republican – McCain – 46.4% (0.4% error)
What did fivethirtyeight.com forecast?
- Democrat – Obama – 52.3% (0.4% error)
- Republican – McCain – 46.2% (0.2% error)
- Other – 1.5% (0.2% error)
To be fair, fivethirtyeight.com was the best of the “poll aggregators”. Real Clear Politics and Pollster.com came up with the following:
- Democrat – Obama – 52.1%, 52.0%
- Republican – McCain – 44.5%, 44.4%
It’s clear from this data that while the Iowa Electronic Markets were quite accurate, fivethirtyeight.com forecasts were even more accurate.
For reference, redbluerichpoor.com showed the following result from fivethirtyeight.com’s final forecasts (of state vote share) which were pretty accurate:

The issue of time
Recently, George Neumann of the Iowa Electronic Markets sent out a document to the Prediction Markets Google group that claimed that the IEM “continue to dominate polls.” One specific line really struck me as ridiculous:
During this 886 day period the average absolute error was 1.2%, amazingly similar to the final polling results but for a much longer period.
So now we’re supposed to assess the accuracy of a prediction market over a 2+ year period?!? So as the race moves and swings, and the markets with it, the IEM seems only to be concerned about the average. This implies that the election is much more static than I believe it to be.
Face it, the race changes. If the election was held within a week of the Republican convention, John McCain would have likely won… the markets reflected that. But events change, and the market changes along with them. This is why I disagree with Peter McClusky’s analysis of price changes here.
Uncertainty is priced into prediction market prices; in the fall Nate Silver of fivethirtyeight.com was consistently predicting a much higher probability of Obama winning the race than prediction markets, because the uncertainty of events between the forecast date and the election was priced into the market. As the election date got closer the uncertainty was removed from the price and Obama’s price went up. But this happens rather late in election futures; in election after election we’ve seen examples of late-breaking news that has had the ability to shift the outcome of a race.
Where does this leave prediction markets?
Were prediction markets consistently better than a well-performing site like fivethirtyeight.com? No.
Were prediction markets consistently worse than a site like fivethirtyeight.com? No.
They perform largely the same, though the final accuracy of fivethirtyeight.com was a tad bit better than InTrade/IEM. But the purpose of the two types of sites are different.
Fivethirtyeight.com and similar sites take current data and process it to extrapolate trends. These sites lag real events; Nate Silver mentioned on a number of times that he expected his model’s forecast to move, but that it hadn’t because the relevant polls hadn’t hit the model yet. Between the time a poll closes, the result released and then incorporated into the model is anything from a day or two to several days. So while it looks quite accurate, a poll aggregator is a lagging indicator. (Another couple of election cycles will tell us if their accuracy continues or was just a fluke in 2008.)
Prediction markets show the results of what a group of traders believe what will happen. This includes polling data, but also reacts to real-time information. A candidate makes a huge gaffe, and the market price will reflect it in minutes, where a poll aggregator could take days to see any effect.
This is the “social utility” of prediction markets, to answer a question that Chris Masse always poses. While they are on par with the accuracy of the best poll aggregators, their forecasts are real-time and reflect the state of the race right now. No other mechanism does this. While markets are certainly fed by polls, that isn’t the whole puzzle in and of itself.
Prediction markets worked quite well again this election cycle. Though their final forecasts were on par with the best poll aggregators, their real-time forecasts throughout the election season is the reason why they should be examined and discussed more broadly.


