Evaluating prediction market success in the 2008 election (…or, why wrong is right)

November 17th, 2008

There is a huge mis-conception in the media when it comes to evaluating the success of prediction markets in the recent US election. Simply put, you have to be wrong to be right. But depending on the type of prediction market you are operating, different methods of assessment are required. I’m going to go through each type of market and compare it to the best-of-class of poll aggregators (fivethirtyeight.com), discuss what they came up with, and potentially find a potential winner from the 2008 forecasts.

Probabilistic predictions (and prediction markets)

The most popular prediction markets this election season were probabilistic markets, where the payoff was either 1 or 0. (In the case of InTrade, $10 or $0.) The market price can and is interpreted as a percent probability that the contract will take place; one of the candidates getting elected. Unfortunately, many commentators believe that once a contract is above 50% that the candidate will win, and if a candidate is above 50% in a given market and doesn’t win, the prediction market has failed.

This is completely wrong.

I’ve discussed in previous posts that you have to be wrong in order to be right. As a quick reminder:

  • 98% chance = 1 in 50 will be wrong
  • 90% chance = 1 in 10 will be wrong
  • 80% chance = 1 in 5 will be wrong
  • 75% chance = 1 in 4 will be wrong
  • 67% chance = 1 in 3 will be wrong
  • 50% chance = 1 in 2 will be wrong

So how did the prediction markets do? And how did fivethirtyeight.com do in comparison? (They’re the only site I know of that takes poll results and turns them into probabilistic forecasts mathematically.) Here’s a quick snapshot:

InTrade results

InTrade.png

We immediately run into the primary problem with evaluating these markets… while there are 50+ markets, only a small number of these were not at the extremes of the scales. I’ve just included those with a market price of <90%.

This looks great for InTrade, but it's deceiving. For example, let's take a look at all contracts at approximately 80%. (FL, OH, VA, NV, and CO). They have an average market price of 82.6%. If InTrade was perfectly calibrated, one of these should have been wrong.

It looks better on the other side. If you calculate MT, GA, ND, and IN they have an average market price of 26.75%. In fact, one of these four were wrong (Indiana).

For the mathematical amongst you, it should be clear that this type of assessment is quite crude. When you have so few data points, you need to pick and choose "bins" of data with your own best judgement. Ideally there would be enough markets that you could create strict "bins" of data and measure against those. (This is exactly how Inkling created their plots here… I would encourage you to read their post, too.) Unfortunately, we only have a small number of “battleground” markets and this just isn’t possible.

Fivethirtyeight.com results

538.png

In my opinion, fivethirtyeight.com appears to have done marginally better. Take the bottom five markets shown, which have a combined probability of 15%. In fact, one of these five actually occurred; a 20% success rate.

Another slice of data shows the same thing: MT, ND, IN, and MO have an average probability of 30% and an actual success rate of 25%. Taking MO, NC and FL the probability was 61%, and and actual success rate of 67%.

I say that fivethirtyeight.com has done marginally better because I could take reasonable slices of data throughout their predictions and come up with reasonably calibrated results. I had to specifically pick and choose to find similar results with InTrade. In other words, the fivethirtyeight.com forecasts were more internally consistent.

Non-probability predictions (and prediction markets)

Some of the lesser-cited and more lightly-traded prediction markets for the 2008 election cycle were on index markets on the vote share for each candidate. Here was the final tally of the national vote:

  • Democrat – Obama – 52.7%
  • Republican – McCain – 46.0%
  • Other – 1.3%

What did the Iowa Electronic Market forecast? (Data from midnight on the 3rd)

  • Democrat – Obama – 53.5% (0.8% error)
  • Republican – McCain – 46.4% (0.4% error)

What did fivethirtyeight.com forecast?

  • Democrat – Obama – 52.3% (0.4% error)
  • Republican – McCain – 46.2% (0.2% error)
  • Other – 1.5% (0.2% error)

To be fair, fivethirtyeight.com was the best of the “poll aggregators”. Real Clear Politics and Pollster.com came up with the following:

  • Democrat – Obama – 52.1%, 52.0%
  • Republican – McCain – 44.5%, 44.4%

It’s clear from this data that while the Iowa Electronic Markets were quite accurate, fivethirtyeight.com forecasts were even more accurate.

For reference, redbluerichpoor.com showed the following result from fivethirtyeight.com’s final forecasts (of state vote share) which were pretty accurate:

2008_2008-538.png

The issue of time

Recently, George Neumann of the Iowa Electronic Markets sent out a document to the Prediction Markets Google group that claimed that the IEM “continue to dominate polls.” One specific line really struck me as ridiculous:

During this 886 day period the average absolute error was 1.2%, amazingly similar to the final polling results but for a much longer period.

So now we’re supposed to assess the accuracy of a prediction market over a 2+ year period?!? So as the race moves and swings, and the markets with it, the IEM seems only to be concerned about the average. This implies that the election is much more static than I believe it to be.

Face it, the race changes. If the election was held within a week of the Republican convention, John McCain would have likely won… the markets reflected that. But events change, and the market changes along with them. This is why I disagree with Peter McClusky’s analysis of price changes here.

Uncertainty is priced into prediction market prices; in the fall Nate Silver of fivethirtyeight.com was consistently predicting a much higher probability of Obama winning the race than prediction markets, because the uncertainty of events between the forecast date and the election was priced into the market. As the election date got closer the uncertainty was removed from the price and Obama’s price went up. But this happens rather late in election futures; in election after election we’ve seen examples of late-breaking news that has had the ability to shift the outcome of a race.

Where does this leave prediction markets?

Were prediction markets consistently better than a well-performing site like fivethirtyeight.com? No.

Were prediction markets consistently worse than a site like fivethirtyeight.com? No.

They perform largely the same, though the final accuracy of fivethirtyeight.com was a tad bit better than InTrade/IEM. But the purpose of the two types of sites are different.

Fivethirtyeight.com and similar sites take current data and process it to extrapolate trends. These sites lag real events; Nate Silver mentioned on a number of times that he expected his model’s forecast to move, but that it hadn’t because the relevant polls hadn’t hit the model yet. Between the time a poll closes, the result released and then incorporated into the model is anything from a day or two to several days. So while it looks quite accurate, a poll aggregator is a lagging indicator. (Another couple of election cycles will tell us if their accuracy continues or was just a fluke in 2008.)

Prediction markets show the results of what a group of traders believe what will happen. This includes polling data, but also reacts to real-time information. A candidate makes a huge gaffe, and the market price will reflect it in minutes, where a poll aggregator could take days to see any effect.

This is the “social utility” of prediction markets, to answer a question that Chris Masse always poses. While they are on par with the accuracy of the best poll aggregators, their forecasts are real-time and reflect the state of the race right now. No other mechanism does this. While markets are certainly fed by polls, that isn’t the whole puzzle in and of itself.

Prediction markets worked quite well again this election cycle. Though their final forecasts were on par with the best poll aggregators, their real-time forecasts throughout the election season is the reason why they should be examined and discussed more broadly.


Election Tuesday – What to expect from the prediction markets

November 3rd, 2008
Beachscape.jpg

Tomorrow is going to be a landmark day for prediction markets. The 2008 US election cycle has been the most-polled, most-predicted, and likely the most-analyzed election in history. It’s been going for nearly two years, and I for one am glad the election will soon be over and governing (by whomever wins) will soon begin.

But why will it be a landmark day for prediction markets? Simply put: the data.

Prediction Markets and Polls – The Data

There are prediction markets on a wide variety of sites, with both play-money and real-money incentives. Iowa Electronic Markets, InTrade, and Betfair for real-money; HubDub, Inkling, NewsFutures for play-money. There are more, but these are the sites I’ve seen cited most often. (It’s too bad ConsensusPoint didn’t push TheWSX.com this election cycle.)

More importantly, there are a few sites that offer incredibly deep (and also probabilistic) analysis into polls. Most notably fivethirtyeight.com, which I seem to be checking a couple of times a day, now. There are national polls, national tracking polls, state polls, and even some state tracking polls! Fivethirtyeight in particular does deep-level statistical analysis to determine from poll results and demographic data how likely each state is to vote for each candidate.

The number of data points, from different prediction markets, polls, and poll trendlines/analysis will be immense. The sum total of data that will be available after this election should be a treasure trove for researchers, and should finally prove the accuracy of prediction markets.

But there’s a hitch…

Yes, there’s always a hitch, and it’s something I’ve discussed before. In elections, polls and prediction markets are measuring two different things.

Polls are measuring the percentage support for a candidate. Generally around 40-60% or so, unless it’s a total blow-out.

Prediction markets measure the percentage chance that the candidate will win their election. When the election is tight, around 50%, when it’s a blowout can regularly be 95%+. (Few prediction markets exist for candidates’ vote share; really only on the presidential level.)

What should you expect on Election Day?

I expect that a number of news outlets will be quoting percentages from InTrade in the run-up to the end of polls closing in the evening. Already final results contests are springing up, including one in the New York Times where you earn points based on current InTrade odds. You can also expect a LOT of volume on the markets tomorrow. But once the results start rolling in, the news is going to focus on the candidates alone. Wednesday will start the morning-after evaluation of the polls and markets, which will likely last for quite some time.

What does this mean in the end?

Comparing the results of polls and prediction markets is certainly like comparing apples and oranges. There are certainly some similarities, but they are fundamentally different.

What we need to do is evaluate how each forecasting method performed independently. For prediction markets, that means that a “failure” (where a prediction >50% didn’t happen) is quite likely a success. For polls, that means that a result just a few percentage points off (outside its MOE) is a failure.

I believe that prediction markets will come out looking quite good in this election. They’ve already proven their worth to me; when poll results might indicate a close or tightening race in places, the prediction market magnifies the difference, and in many cases demonstrates the poll volatility is just noise.

In the end, are the results from the prediction markets useful? Based on the number of times I’ve seen them cited this year… the answer is an unqualified YES. Are they perfect? No, and neither is any other forecasting system or technique.

I’m really looking forward to tomorrow…


General round-up of prediction market topics

October 28th, 2008

The US election is just over a week away, and with that there are a few different topics I’d like to touch on. With the explosion in new prediction markets since the last presidential election, we should see some interesting (but hopefully consistent) results.

  • First, a great post from Koleman Strumpf on Midas Oracle points out that half of all trades on the Betfair exchange in 2004 occurred on Election Day! While I personally think that was quite likely due to the early exit poll news for Kerry and the subsequent swing back to Bush, it proves that there are still quite a few people that may be waiting until the very last day to trade.
  • Jason Ruspini just started a new thread on the Prediction Markets e-mail group regarding some divergences he’s seen between prices on InTrade and fivethirtyeight.com.

    While I think some of the things he’s observed is due to the way Nate Silver presents data on his site, Jason brings up a very good point. A thorough analysis of movement in the InTrade prediction markets should be compared to the daily calculated win percentage from fivethirtyeight (where all data comes strictly from polls). I think it could be very revealing, and give the public quite a bit more data on the accuracy of polling, aggregation of polls, and prediction markets.

  • A long time ago I started four different markets on Inkling Markets that will hopefully predict control over each house of Congress, and the number of seats each party will have after the election. Data is shown here:


    (as I write this, the Democratic percentage is 53.8%, which corresponds to 234 seats in the House of Representatives.)

  • I may need to update my post on prediction market software soon. Xpree (founded by Mat Fogarty, and recently joined by Leslie Fine, a well known prediction market researcher from HP) may be changing their name. The top three picks according to their contest on NameThis were:
    1. Metricast
    2. UREprojection
    3. Keymet

    Personally, I don’t like #2 or #3, but Metricast sounds interesting. It also sounds like a much better fit to what they do than “Xpree.”

    Good luck to them, if they choose to go down this route.

  • Do you speak Danish? Nosco is hiring!
    For English speakers, so is InTrade and Xpree.

A review of idea and innovation software

October 10th, 2008
Lighthouse.jpg

The most popular post I’ve written to date is a review of prediction market software. Today’s post is going to be the same, but for idea/innovation software (henceforth referred to as innovation software).

Trying to even find and identify all the different types of innovation software is difficult because of the different ways people and companies think about innovation. Prediction markets are straightforward; they’re futures markets, so the software is largely the interface between the user and the order book on the database. That is not at all so for innovation software. Different people think about innovation in different ways, which I referred to in a previous post.

The list below is likely not complete, but I believe it does pick up the major players.

Digg for Ideas (ranking systems)

Salesforce – Salesforce’s solution is really well known, having been used by Starbucks in the myStarbucksIdea contest and also in Dell’s IdeaStorm. It’s a simple popularity contest, but tied in nicely with Salesforce’s platform. I’ve heard, however, that it required a significant time investment on Starbuck’s/Dell’s part in order to properly evaluate the highest-ranked ideas internally, even before they ever reached the stage of implementation.

BrightIdea – The information on BrightIdea’s software is relatively scarce; just a list of features. It looks like it’s trying to be a one-stop shop for everything, from research to idea ranking to analytics to rewards to financials and more. I understand that it’s a fairly mature product, and they have some solid clients. Overall, it’s a bit of a dark horse.

Hype Idea Management – This is a German product, and looks to be fairly basic; it’s just an idea capture and rating system. To me it looks both too basic (in general) and too complex (particularly when it comes to ranking/rating).

Idea Central from imaginatik – This is yet another piece of software that seems to exist only in a list of bulletpoints and large blocks of text. Based on their claims of paying clients it must exist and work, but I would certainly appreciate some screenshots and demos to understand what it actually focuses on.

Spigit – I’m really not sure what to think of Spigit. They look to have a fairly advanced product, which is actually three products: IdeaSpigit, InnovationSpigit, and ContestSpigit. IdeaSpigit seems to be a standard “Digg for ideas” model, where you get feedback from customers like the Salesforce IdeaExchange. InnovationSpigit is an application to use internally, with quite (and needlessly?) sophisticated algorithms to rate/rank ideas. It also bills itself as a prediction market, so I’m not sure how much of the system is a ranking/rating system and how much of it is a market-based system. Finally, ContestSpigit is the same kind of system but for a specific campaign.

Spigit seems to have a good client base and their software has won an award or two, but it’s tough to tell how useful it actually is for their clients. To me it appears to be needlessly complex, but I believe these systems should be simple and useable above all else. Their marketing positions them as a significant competitor in this industry, but I’m not sure how much is hype and how much is truth.

Market-based (aka betting-based) solutions

Nosco IdeaExchange – Nosco is a great company from Denmark that I first met a couple of years ago. They first developed a portfolio of software applications that included prediction markets and what they call an Idea Exchange. Since then they’ve found much more demand for the Idea Exchanges and have since shifted their focus to that alone.

Their Idea Exchange is still modeled off of a futures market, where you can buy and sell ideas. I still would suspect a model like this to be susceptible to gaming and in general becoming a Keynesian beauty contest. (People don’t buy what is worthwhile, they buy what they think others will think is worthwhile.) That said, they’ve what looks to be a mature product that looks fantastic and has been used by a number of Danish companies.

Consensus Point – Some clients of ConsensusPoint use their standard ForesightServer software to run prediction markets on ideas. I’ve mentioned before how prediction markets aren’t suitable for this. To ConsensusPoint’s credit, they aren’t specifically marketing a one-solution-fits-all approach; it’s just what their clients are doing with the software.

NewsFutures Idea Pageant – I really like the quote from NewsFutures on their Idea Pageant page: “A large number of ideas makes a standard prediction market approach impractical.” While I don’t think that’s the only criticism, it’s a good chunk of a start.

The Idea Pageant is a fairly straightforward and easy to understand application. Each person gets a number of positive (green) votes/tokens and a smaller number of negative (red) votes/tokens. These can be refreshed periodically, and the consistently positive ideas float to the top.

Qmarkets – Qmarkets is another prediction markets startup that seems to have also moved into the innovation software arena. There’s a fairly extensive feature list; so much so I’m not sure how to gauge exactly how complex the software actually is. Without screenshots, it’s tough to tell how well developed the solution is, but it’s certainly a potential player in the market.

Xpree – The Xpree Open Innovation Markets seems to be a bit of a hybrid solution. It has a voting system for ideas, but then later management can transfer those ideas into a prediction market. This seems to be a well-thought-out way of approaching the problem. However, I personally believe that a client could shoot themselves in the foot with poor implementation of this software. It’s all too easy to start to turn every idea into a prediction market, and that (again) is a bad plan.

Marketplace solutions (specific innovations sought)

InnoCentive – InnoCentive is one of the most well-known idea marketplaces. They’ve had some good successes so far, and substantial press coverage. I would assume they are trying to quickly ensure they take advantage of network effects and become the primary marketplace for innovators and the companies looking for innovations.

NineSigma – NineSigma is another company in this arena. It’s less a true marketplace than a forum to receive and respond to RFP’s. They do seem to have a decent client list, and have been in business since 2000.

PhilOptima – PhilOptima is a similar innovation marketplace, but is aimed at “grant makers” and thus has a different audience on the innovation seeker side.

Innovation Exchange – This site appears to be a marketplace for innovations in general, without much focus. While that’s great in principle, I think the lack of focus perhaps hurts their chances in getting significant penetration in any market, and thus any substantial market share. I mention this because there will likely be a race for market share amongst these sites, and only the winner will get the ideal network effects.

fellowforce – This is similar to Innovation Exchange, but geared more toward the Web2.0 crowd, with widgets and rankings prominently promoted.

Full-fledged Marketplace solutions (specific innovations sought and sold)

Yet2 – I’m really intrigued by this site. What’s unique is that it offers something both for companies with specific innovation needs (like the category above) but also for entrepreneurs, engineers and scientists with innovations they believe have commercial potential. While the design is a bit harsh visually, it’s a very intriguing concept.

Other

Rite-Solutions – Rite-Solutions became quite well known a couple of years ago based on a well-known article in the New York Times that discussed how they used their software to allow everyone in the company to discuss and promote their ideas internally. It’s become quite a successful product for them (though perhaps not as lucrative as some of their government/gaming industry work!).

BrainBank – BrainBank looks to be a very interesting software solution that promotes both the ranking and refining of ideas, but also some management around implementation. It’s very interesting.

MindMatters – I classified MindMatters software in this category since I couldn’t quite tell what the main purpose of the software actually is. It mentions idea capture, challenges and workflow, but it wasn’t obvious how they all fit together in their particular software application.

Summary

There are a multitude of different approaches to innovation, and there are a multitude of different software applications to help companies and organisations innovate. Are there any true market leaders? Not as far as I can tell. Some, like Salesforce, are quite well known, but aren’t necessarily that useful for a wide cross-section of companies.

This post is meant simply to review and discuss different software applications available around innovation. I plan to write much more on how innovation does and can work in organisations. You can find all of my past innovation-related posts here, and future posts will go there, too.

I sincerely look forward to your feedback.

Reminder: If you’d like to receive future posts by e-mail, just click this link to get new writings delivered directly to your inbox! And if you use an RSS Reader you can click this link to subscribe.


Prediction Markets – different value to different audiences (incl. big news from Hubdub)

September 21st, 2008
Wall.jpg

In my previous post on categorizing prediction markets, one of the key differences is whether a market is public or private. (The “P” in the ICROP criteria.) There is fundamentally a very different value to the operators of a public market compared to a private market. My train of thought is below, starting with some notes on recent prediction market news.

Recent news from Hubdub

Hubdub recently announced a partnership with Reuters, which I think is a great step for public prediction markets. They previously had announced a partnership with the Huffington Post blog. When that was announced two weeks ago I was a little torn of what to think. It was great they were able to work with a major internet brand, but the implementation was pretty weak. You could find Hubdub’s markets on Huffington Post tag pages, but even knowing it was there I really had to search to find them on the page. It seemed to be something that was being treated as just a minor experiment rather than something serious.

The Reuters announcement is much more important. It looks like it’s kicking off a wider partnership program at Hubdub, which is quite exciting. Reuters has a dedicated section on the Hubdub site which will apparently be regularly updated by Reuters staff. (They’ve only created 4 questions so far.) The only scheme that I think would be better than this is if Reuters had put a dedicated Hubdub section on their (Reuters) site, and that could certainly happen down the road.

By opening up Hubdub to a wider partnership program, they will help other companies and bloggers build a reputation by allowing those people and organizations to generate interesting questions and predictions. This is great news, and should spur even more growth for both the participating bloggers and Hubdub itself.

But this got me thinking about public versus private prediction markets…

Value in Public versus Private markets

The value to operators of public markets is significantly different than private markets. I think this is why we are seeing significantly different types of growth in the two types of markets.

The value to operators of public markets comes from generating an active, thriving community of users. These users may be targets for advertising, subjects for demographic research (HSX), see examples of technology (Inkling, NewsFutures), or another unique model to be determined. This is where Hubdub looks to be pioneering.

Nigel Eccles, the CEO of Hubdub, has mentioned on many occasions some interesting statistics about an average subscriber’s interaction with a newspaper’s online presence. For most newspapers it is extremely limited; a person will check a story or two and leave. There is little real engagement, and so page views and advertising rates aren’t as high as they could be. (Alex Kirtland talked about this here.) However, prediction markets have shown themselves to be hugely engaging, and can also be made highly local and relevant to a small audience. This looks to be the way they’re going with their partnership program. Each partnership will add and build another sub-community of users, which adds value to both the partner and Hubdub.

The value is significantly different for private markets. When I think of private markets, I think of a corporate prediction market where the company is looking to get useful and accurate business intelligence from their employees. The business intelligence is the value many companies claim they are trying to capture.

The difficulty in private markets is that there is no obvious, traceable value chain. In that I mean that most companies cannot say that because the market told management that there was X% chance of event A occurring, the company changed strategy and saved $Y. Many companies are in reality treating them as non-actionable market intelligence, where they examine only after the fact how accurate the predictions were.

Even if a company did trust their employees enough to take action directly based off of what their internal markets were telling them, it may still be difficult to calculate the value of that intelligence. Particularly since management wants to be (or at least appear) smarter than their employees, it is quite easy to claim after the fact that they would have taken the same actions based off of other intelligence.

Fundamentally, it takes a company that both trusts their employees enough to take action on the market indicators and management that is honest enough in what would have happened without that intelligence in order to calculate the value of those prediction markets. For example, Mat Fogarty has talked about how he used prediction markets at EA to quantify game quality scores, which is certainly useful. But where that can directly turn into additional profit is if EA (or any similar company) took prediction market intelligence to adjust how they filled their distribution channels. They could save money by not creating unnecessary copies of bad games, and could make more money by ensuring they had enough copies of hit games ready when they went on sale. As far as I am aware, few companies have taken that final step to action based on market results.

Prediction markets certainly add value even where the elements I mentioned above aren’t present, it’s just that the direct value cannot be easily calculated. And until an executive can directly point at how the cost of a prediction market is more than made up by the direct value added, the growth of prediction markets will be limited.

That said…

Prediction markets are clearly a growth industry, even in private markets. Inkling Markets, NewsFutures, and Xpree have all recently hired great new people into their businesses, so the market for private markets is clearly growing. But until the value calculation above can be directly made, that growth rate just isn’t as high as I wish it would be.

(Gratuitous photo is from a recent holiday, specifically a section of the Great Wall of China outside of Beijing.)