Probability and Prediction Markets - What matters and what doesn’t
August 14th, 2007If you're new here, you may want to get Mercury's Blog by Email or subscribe to my RSS feed. Thanks for visiting!
Probability is a dangerous topic in discussing prediction markets. Many people don’t really understand probability and thus see some of the prediction market “misses” as “failures” instead. Some of these have been well-publicised, such as the InTrade market for the Senate in 2006, and widely criticised. But there are two issues that we need to dis-entangle here:
- The only way to evaluate accuracy of predictions is with a sufficient group or series of predictions.
- Most prediction market “failures” are one-off events.
Chris Masse at MidasOracle is trying to say that the Karl Rove resignation market at NewsFutures wasn’t “predictive” because it was trading at ~20% when the announcement was made. This is horse-(manure). Based on this argument, at what point does a market become “predictive”? Only when it trades above 50%? Or perhaps when it trades above 80%, or some other figure?
There is no way to evaluate accuracy of a single binary prediction because it either happens or it doesn’t. Evaluation requires lots and lots of predictions; only then you can start determining accuracy. If all the events that were judged to have a 20% chance of occurring actually occur 20% of the time, then the market is calibrated and accurate. (If five people were all judged to have a 20% chance of resigning and one of them actually did, the judgments would be calibrated.)
This runs into the second issue above: most “failures,” to include the 2006 Senate market, the Pope prediction market, Olympics choice market, and more are one-off events. They are generally “misses” or “failures” because they a) occur so infrequently that determining accuracy as described above isn’t possible and b) because each time they’re run traders have to learn all over again what signals and information are important. I believe that all of these markets could be accurate if they were run frequently enough that traders could simply learn from their mistakes, which is a key feedback in a normal market. When a Pope is elected every 10-30 years, it’s very difficult to re-learn papal politics to trade effectively, and even harder to determine if it’s effective because it won’t occur again for another 10-30 years. Since that won’t happen, prediction markets are still a good way of aggregating opinions about any of these events happening.
Even election markets fall prey to this phenomenon, though the attention paid to elections means they are traditionally fairly accurate, though I don’t remember seeing any paper on the 2006 elections specifically. The issue here sometimes becomes one of timescale. In politics there is a big risk of a candidate doing or saying something stupid pretty much up to the last minute, so traders factor that into their prices, leaving a favourite at 80%, when perhaps they should be at 95%. That gap gets made up only once they really can’t say anything stupid anymore, such as the day of the election itself.
In summary, recognise that a single prediction is just that: the traders’ aggregated opinion of the likelihood of that event occurring. Once enough of these judgements are put together, then the accuracy can be determined, and only then. Remember that an event that only has a 1% chance of happening will still actually take place one in a hundred chances!
——————-
I’d like to also point out David Pennock’s blog post here for more detail. Calibration, the test described above, is a good test, but not the only test of accuracy. There are more statistical tests that should be run (again, only with a sufficient number of predictions), but even these are only useful when comparing two prediction methods against each other.