Aggregating the results of many World Cup prediction models

wc2014_winprob_ensemble10Who will win the 2014 World Cup?

There is no shortage of quants making their own prediction. These range from my own at The Power Rank to the financial types at Goldman Sachs.

As much as I’d like to think my model is the best, research has shown that combining the results of many models often gives a better prediction. If you want to sound smart about combining predictions, you say the words ensemble learning.

The visual shows the average win probabilities for 10 different World Cup models, which are described at the bottom of this post. This list gives the full results.

1. Brazil, 28.64.
2. Spain, 12.62.
3. Argentina, 11.49.
4. Germany, 10.75.
5. Colombia, 3.73.
6. France, 3.72.
7. Portugal, 3.47.
8. Netherlands, 3.41.
9. Uruguay, 2.89.
10. England, 2.85.
11. Belgium, 2.34.
12. Chile, 1.87.
13. Italy, 1.84.
14. Ecuador, 1.43.
15. Russia, 1.13.
16. Ivory Coast, 1.05.
17. Mexico, 1.02.
18. United States, 0.82.
19. Switzerland, 0.78.
20. Bosnia-Herzegovina, 0.67.
21. Greece, 0.64.
22. Croatia, 0.58.
23. Japan, 0.45.
24. Nigeria, 0.44.
25. Ghana, 0.40.
26. South Korea, 0.22.
27. Iran, 0.17.
28. Algeria, 0.16.
29. Cameroon, 0.11.
30. Honduras, 0.11.
31. Costa Rica, 0.09.
32. Australia, 0.08.

The aggregated predictions neatly split up the 32 team field into 3 classes.

Brazil

First, Brazil has the highest win probability at 28.6%. This results from their status as a traditional power as well as home country advantage in this World Cup.

Research has shown that referee bias plays a large role in home advantage. Yesterday’s opening game between Brazil and Croatia was the perfect example.

Despite playing poorly overall, Brazil was awarded a penalty kick on a terrible call in the second half. Neymar converted, giving Brazil a 2-1 lead.

Then the referee missed a foul on Brazil deep in Croatia’s end of the field. As a result, Oscar scored a beautiful goal to finish off a 3-1 win.

The other three elite teams

The second class of teams consists of Spain, Argentina and Germany, teams with greater than 10% win probability each. Should Brazil stumble, one of these traditional powers should lift the trophy.

Spain won the last World Cup with their mesmerizing short passing game and stout defense. Despite the advanced age of their stars, they have a 12.6% chance of winning the World Cup.

Argentina is probably the weakest of these three teams. However, some of the models included a home continent advantage for Argentina. This puts them ahead of Germany with a 11.5% chance to win.

Germany is a dynamic young team with a potent offense. The Power Rank thinks they’re the best offensive team in the world by a significant margin. However, their defense can let them down, as it did against Italy in Euro 2012.

Most models and pundits consider Brazil, Spain, Argentina and Germany the favorites to win the World Cup.

Randomness in soccer competitions

The remaining 28 teams make up the third class of teams. There’s 36.5% chance that one of these teams wins the World Cup, the event that interests me most.

This type of “upset” has not happened recently at the World Cup. The last 5 World Cup champions are Spain, Italy, Brazil, France and (West) Germany, all traditional football powers.

However, the World Cup offers a small sample size of matches.

The group stage has three games. Have you ever looked the table of your favorite league after 3 games? The best do not necessarily rise to the top that early in the season.

After the group stage, the World Cup enters the knock out stage with 16 teams. The top teams do not always survive a single elimination tourney.

Manchester City, the Premier League champions this season, did not make the finals of the FA Cup. Hull City, 16th in the Premier League table, did.

Americans know the randomness of single elimination tourneys well from the NCAA men’s basketball tournament. This season, top teams like Arizona, Louisville and Florida failed to win the title. An unheralded Connecticut team, angry over the 1.5% win probability from The Power Rank, won the tourney.

While the World Cup hasn’t had an upset winner lately, the same is not true for the European Championships.

In 2004, Greece qualified for their first championships in 24 years. They won Euro 2004.

In 1992, Denmark only qualified when people started shooting each other in Yugoslavia. They won the competition.

Will 2014 be the year that randomness descends on the World Cup? Stay tuned.

Models used in the aggregate predictions

These 10 models were used.

  • The Power Rank, using a 12 year window of matches. A description of the simulation methodology is here, while an explanation of using such a long window of games is here.
  • The Power Rank, with a 4 year window of matches, weighted by importance of the match. These rankings were quite different from the 12 year rankings above. Germany was the top team instead of Brazil. A big reason was their 4-0 win over Argentina, a 2010 World Cup match that got 4 times the weight of a friendly in the rankings.
  • Betting markets at Bovada.
  • Infostrada, emailed to me from Simon Gleave.
  • Michael Caley at SB Nation used a least squares method that considered shots in addition to goals.
  • Numberfire.
  • Goldman Sachs. I would have never used their results had they continued to use the FIFA rankings like they did in 2010. However, they transitioned to Elo ratings, which are more accurate for international soccer.
  • David Dormagen. He aggregates a number of different rankings. Not a fan of how he calculates ties, but…
  • Roger Kaufman.
  • Bloomberg Sports, who do their own rankings.

Speak Your Mind

*

This site uses Akismet to reduce spam. Learn how your comment data is processed.