Aggregating the results of many World Cup prediction models

wc2014_winprob_ensemble10Who will win the 2014 World Cup?

There is no shortage of quants making their own prediction. These range from my own at The Power Rank to the financial types at Goldman Sachs.

As much as I’d like to think my model is the best, research has shown that combining the results of many models often gives a better prediction. If you want to sound smart about combining predictions, you say the words ensemble learning.

The visual shows the average win probabilities for 10 different World Cup models, which are described at the bottom of this post. This list gives the full results.

1. Brazil, 28.64.
2. Spain, 12.62.
3. Argentina, 11.49.
4. Germany, 10.75.
5. Colombia, 3.73.
6. France, 3.72.
7. Portugal, 3.47.
8. Netherlands, 3.41.
9. Uruguay, 2.89.
10. England, 2.85.
11. Belgium, 2.34.
12. Chile, 1.87.
13. Italy, 1.84.
14. Ecuador, 1.43.
15. Russia, 1.13.
16. Ivory Coast, 1.05.
17. Mexico, 1.02.
18. United States, 0.82.
19. Switzerland, 0.78.
20. Bosnia-Herzegovina, 0.67.
21. Greece, 0.64.
22. Croatia, 0.58.
23. Japan, 0.45.
24. Nigeria, 0.44.
25. Ghana, 0.40.
26. South Korea, 0.22.
27. Iran, 0.17.
28. Algeria, 0.16.
29. Cameroon, 0.11.
30. Honduras, 0.11.
31. Costa Rica, 0.09.
32. Australia, 0.08.

The aggregated predictions neatly split up the 32 team field into 3 classes.

Brazil

First, Brazil has the highest win probability at 28.6%. This results from their status as a traditional power as well as home country advantage in this World Cup.

Research has shown that referee bias plays a large role in home advantage. Yesterday’s opening game between Brazil and Croatia was the perfect example.

Despite playing poorly overall, Brazil was awarded a penalty kick on a terrible call in the second half. Neymar converted, giving Brazil a 2-1 lead.

Then the referee missed a foul on Brazil deep in Croatia’s end of the field. As a result, Oscar scored a beautiful goal to finish off a 3-1 win.

The other three elite teams

The second class of teams consists of Spain, Argentina and Germany, teams with greater than 10% win probability each. Should Brazil stumble, one of these traditional powers should lift the trophy.

Spain won the last World Cup with their mesmerizing short passing game and stout defense. Despite the advanced age of their stars, they have a 12.6% chance of winning the World Cup.

Argentina is probably the weakest of these three teams. However, some of the models included a home continent advantage for Argentina. This puts them ahead of Germany with a 11.5% chance to win.

Germany is a dynamic young team with a potent offense. The Power Rank thinks they’re the best offensive team in the world by a significant margin. However, their defense can let them down, as it did against Italy in Euro 2012.

Most models and pundits consider Brazil, Spain, Argentina and Germany the favorites to win the World Cup.

Randomness in soccer competitions

The remaining 28 teams make up the third class of teams. There’s 36.5% chance that one of these teams wins the World Cup, the event that interests me most.

This type of “upset” has not happened recently at the World Cup. The last 5 World Cup champions are Spain, Italy, Brazil, France and (West) Germany, all traditional football powers.

However, the World Cup offers a small sample size of matches.

The group stage has three games. Have you ever looked the table of your favorite league after 3 games? The best do not necessarily rise to the top that early in the season.

After the group stage, the World Cup enters the knock out stage with 16 teams. The top teams do not always survive a single elimination tourney.

Manchester City, the Premier League champions this season, did not make the finals of the FA Cup. Hull City, 16th in the Premier League table, did.

Americans know the randomness of single elimination tourneys well from the NCAA men’s basketball tournament. This season, top teams like Arizona, Louisville and Florida failed to win the title. An unheralded Connecticut team, angry over the 1.5% win probability from The Power Rank, won the tourney.

While the World Cup hasn’t had an upset winner lately, the same is not true for the European Championships.

In 2004, Greece qualified for their first championships in 24 years. They won Euro 2004.

In 1992, Denmark only qualified when people started shooting each other in Yugoslavia. They won the competition.

Will 2014 be the year that randomness descends on the World Cup? Stay tuned.

Models used in the aggregate predictions

These 10 models were used.

  • The Power Rank, using a 12 year window of matches. A description of the simulation methodology is here, while an explanation of using such a long window of games is here.
  • The Power Rank, with a 4 year window of matches, weighted by importance of the match. These rankings were quite different from the 12 year rankings above. Germany was the top team instead of Brazil. A big reason was their 4-0 win over Argentina, a 2010 World Cup match that got 4 times the weight of a friendly in the rankings.
  • Betting markets at Bovada.
  • Infostrada, emailed to me from Simon Gleave.
  • Michael Caley at SB Nation used a least squares method that considered shots in addition to goals.
  • Numberfire.
  • Goldman Sachs. I would have never used their results had they continued to use the FIFA rankings like they did in 2010. However, they transitioned to Elo ratings, which are more accurate for international soccer.
  • David Dormagen. He aggregates a number of different rankings. Not a fan of how he calculates ties, but…
  • Roger Kaufman.
  • Bloomberg Sports, who do their own rankings.

Check out the interactive visual for the World Cup

Screen shot 2014-06-09 at 10.17.53 PMThe World Cup starts today, and this interactive visual shows you all the important numbers.

It starts with international soccer/football rankings from The Power Rank. This method takes the scores in games and adjusts for strength of schedule. The Power Rank can not only rank teams but also offense and defense.

These rankings feed a simulator that plays the World Cup 10,000 times. By counting the number of times an event happens, like the United States finishes 2nd in Group G, we obtain the probabilities for certain events.

For more details on my methods, click here.

Data visualization

Unlike most sites, we don’t want you to go blind reading these numbers in a table. In collaboration with Andrew Phillips at Chartball, we created this interactive visual with World Cup probabilities.

Hover over a team name to see the odds that they advance through the stages of the competition. The group of circles closest to the team name corresponds to the probability of winning the group. The next closest circles correspond to second place.

For the round of 16, a team can enter this stage of the competition via two routes: winning the group and placing second in the group. Hence, each team has two probabilities for the round of 16. The gray lines show which circles correspond to which scenario.

Each team also has two probabilities for the quarter and semi-finals for the same reason.

Hover over the circles at any stage of the competition to see the corresponding team.

The visual will be updated each night after the games.

The nature of these probabilities

The rankings used for the visual consider the last 12 years of international play. Research by others has shown that these rankings are as accurate as rankings that consider the last 4 years of play.

However, these rankings tend to underestimate up and coming teams like Belgium, a team that suddenly has stars on major European club teams. While my rankings have them at 41st, they’re more likely a top 20 team.

The visual has Russia as the top team to win Group H at 42%. However, I’m probably underestimating Belgium’s chances at 24%.

World Cup 2014 win probabilities from The Power Rank

wc2014_winprobWho will win the 2014 World Cup?

The visual shows the top contenders according to The Power Rank. This list gives the odds for all 32 teams.

1. Brazil, 35.9%.
2. Argentina, 10.0%.
3. Spain, 8.9%.
4. Germany, 7.4%.
5. Netherlands, 5.7%.
6. Portugal, 3.9%.
7. France, 3.4%.
8. England, 2.8%.
9. Uruguay, 2.5%.
10. Mexico, 2.5%.
11. Italy, 2.3%.
12. Ivory Coast, 2.0%.
13. Colombia, 1.5%.
14. Russia, 1.5%.
15. United States, 1.1%.
16. Chile, 1.0%.
17. Croatia, 0.9%.
18. Ecuador, 0.8%.
19. Nigeria, 0.8%.
20. Switzerland, 0.7%.
21. Greece, 0.6%.
22. Iran, 0.6%.
23. Japan, 0.6%.
24. Ghana, 0.6%.
25. Belgium, 0.4%.
26. Honduras, 0.3%.
27. South Korea, 0.3%.
28. Bosnia-Herzegovina, 0.3%.
29. Costa Rica, 0.3%.
30. Cameroon, 0.2%.
31. Australia, 0.2%.
32. Algeria, 0.1%.

For those interested in my methods, see the end of this post.

But first, some quick thoughts on a few teams.

Brazil

The host nation Brazil has the highest win probability at 36%.

Home advantage plays a big role in these large odds. On average, the home team scored about 0.56 goals more than the road team over the last 3 cycles of World Cup qualifying.

As discussed in the book Scorecasting, referee bias plays a big role in home advantage. In last year’s Confederation Cup final, Spain tried to execute their short passing game against the home nation Brazil. From my perspective, the referees let Brazil get away with fouls that stymied Spain’s attack. Brazil won 3-0.

But Brazil also plays some magnificent soccer as the top ranked team in The Power Rank. Their young star Neymar will dazzle you with his quick feet and skills.

Argentina

The other traditional soccer power from South America, ranked 3rd in The Power Rank, has the second highest win probability at 10%.

Argentina benefits from a weak group with Bosnia-Herzegovina, Iran and Nigeria. I like to call it Group of Eternal Life. They have a 85% chance to advance to the knock out stage.

Argentina might also benefit from a home continent advantage. It’s much easier for Argentina fans to travel to Brazil for the World Cup than nations from Europe. Enough fans in attendance could create a home advantage effect like Brazil will enjoy.

I did not include a home continent advantage in my model, so Argentina might have even better odds than 10%.

United States

Expectations are different for the United States. Surviving a tough group with Germany, Portugal and Ghana would be a huge achievement. My numbers give the Yanks a 38% to make the knock out stage.

Those are decent odds for the 20th ranked team in the world. I also looked at their ranking when including only games with Jurgen Klinsmann as coach. Despite all those goals they scored in last year’s Gold Cup, the United States only rises to 17th.

The road to winning the World Cup gets harder in the knock out stage. The United States has a 1.1% chance to win the World Cup, 15th best out of 32 nations.

However, Connecticut had a 1.5% chance to win the 2014 NCAA men’s basketball tourney by my numbers. They beat Kentucky to win an improbable title.

Better predictions

Here’s the truth: If you want the most accurate predictions about who will win the World Cup, you shouldn’t just look at my predictions.

One system is not enough. Research has shown that better predictions arise from aggregating many predictions. This was a key finding in a recent academic paper on using rankings to predicting football matches.

Yeah, it’s a blow to my massive ego. :) But you deserve the best possible predictions for the 2014 World Cup.

I’m curating World Cup predictions from other sources. Next week, I’ll aggregate these predictions for my email list, since they’re my favorite people in the world.

If you want to see those results (and you really should if you’re in any kind of World Cup pool), sign up for my free email newsletter. It’s the best way to get updates on The Power Rank’s content.

Just enter your email address and click on “Sign up now.”








Methodology

Still reading? Thanks, you’re the best.

The World Cup win probabilities start with The Power Rank’s algorithm for ranking teams. It takes margin of victory in matches and adjusts for strength of schedule. With the wide disparity between countries in international soccer, this adjustment is critical for predicting the World Cup.

This algorithm can not only rank teams but also the offense and defense of each team. This allows me to estimate the goal rate for an offense against an opposing defense.

To predict the outcome of a match, I pick a Poisson random variable according to these goal rates per 90 minutes. This model says a team has the same rate of scoring a goal at any point in the match.

For example, teams score 1.34 goals per 90 minutes in international play. This implies that a team has a 1.4% chance to score a goal during any minute. For each minute, you could flip a coin that comes up heads 14 out of every 1000 flips. Repeating this flipping 90 times and counting the heads is the same as getting goals from a Poisson random variable.

To simulate the World Cup, I use this Poisson model for each match in the group stage. To see the predictions for all 48 matches, check out the predictions page.

This model gives not only the winner or loser of each match but also a score. The scores allow for the calculation of tie breakers, which consider goal differential and goals scored.

The same Poisson model applies in the knock out stage. If two teams are tied on goals after regulation, the model is applied again for extra time. I assume each team has a 50-50 chance to win penalty kicks.

The win probabilities arise from counting the number of times each team wins over 10,000 simulations.

New MLB rankings for 2014 – which teams are misvalued?

Screen shot 2014-05-23 at 11.37.32 AMYou would like to know how to properly rate a Major League Baseball team.

At this point in the season (mid May), win loss record is a poor indicator of team strength. With the help of lady luck, a team can win more than their fair share of one run games.

Run differential (runs scored minus runs allowed) is a better metric of team strength. If a team has scored as many runs as it has allowed, you expect that team to have a .500 record after 162 games. On average, teams with a 0 run or point differential have a .500 record in all sports I’ve looked at (football, baseball, basketball, soccer).

However, early in the season, a team’s record can get out of whack from run differential due to short term variance. A team with zero run differential might have gotten blown out in some games but won more than their fair share of one run games. We’ll see how that applies to Milwaukee below.

Rankings after adjusting for schedule strength

At The Power Rank, I take run differential and adjust for schedule strength using my team ranking algorithm. These rankings consider the luck in winning close games as well as the competition a team has faced early in the season.

Here are the results through May 23, 2014.

1. Oakland, (30-17), 1.82
2. Los Angeles Angels, (26-20), 0.86
3. Seattle, (23-23), 0.82
4. Detroit, (27-16), 0.80
5. Colorado, (26-21), 0.71
6. Miami, (25-23), 0.66
7. San Francisco, (29-18), 0.59
8. Kansas City, (23-23), 0.09
9. Toronto, (26-22), 0.08
10. Atlanta, (26-20), 0.07
11. St. Louis, (26-21), 0.07
12. Washington, (24-23), 0.03
13. Los Angeles Dodgers, (25-23), -0.04
14. Minnesota, (23-21), -0.07
15. Cleveland, (23-25), -0.12
16. Chicago White Sox, (24-25), -0.13
17. Texas, (23-24), -0.22
18. Baltimore, (23-22), -0.25
19. Chicago Cubs, (17-28), -0.28
20. San Diego, (21-27), -0.30
21. Tampa Bay, (20-28), -0.31
22. Boston, (20-26), -0.35
23. Cincinnati, (21-24), -0.35
24. New York Yankees, (24-22), -0.36
25. New York Mets, (21-25), -0.37
26. Milwaukee, (28-20), -0.38
27. Philadelphia, (20-24), -0.57
28. Pittsburgh, (20-26), -0.62
29. Houston, (17-31), -0.66
30. Arizona, (18-31), -1.23

The number after a team’s record is a rating. This rating gives an expected run margin in a game against an average team.

The puzzling AL East

When I usually discuss strength of schedule in baseball, I start with the AL East.

Over the past decade, this division has dominated the game Boston and the New York Yankees. More recently, upstart Tampa Bay has used analytics to join the elite in this division. In past years, these teams usually appear in the top 10 of my MLB rankings.

This season, only one team from the AL East (Toronto!!) cracks the top 10.

In fact, among AL East teams, only Toronto has scored more runs than they have allowed. If anyone would have predicted this before the season, he or she would have been crazier than Charlie Sheen.

The remaining AL East teams are ranked 18th and lower. The Yankees bring up the bottom at 24th out of 30 MLB teams.

It’s still early in the season. Based on preseason expectations, I expect Boston and Tampa Bay to bounce back. However, an AL East takeover of the top 10 seems unlikely.

Seattle will win the AL West

The Mariners have a measly 23-23 record, good for 3rd in the AL West.

However, Seattle has scored 12 more runs than they have allowed. When you adjust this for schedule strength, they rise to 3rd in The Power Rank.

A big part of this performance is their play against Oakland, the top ranked team. Seattle has played 10 games against Oakland and posted a +2 run differential in these games. The Power Rank sees this and makes a drastic adjustment.

However, the adjustment is too drastic. Oakland has scored 2 more runs per game than their opponents. If they continued on this pace, they would end the season with a run differential of +331. There’s a better chance that a Khardashian has a happy marriage the rest of her life than Oakland’s continuing on this pace.

I doubt that the AL West really features the 3 best teams in the majors. However, I do think Seattle gives Oakland and the Los Angeles Angels a run for the division title, with newly signed Robinson Cano playing the role of hero. Joe Peta, author of Trading Bases, predicted a division title for Seattle before the season.

The Cubs are better than the Brewers

Milwaukee has stormed out the gate. Behind the bats of Carlos Gomez and Ryan Braun, they have a 28-20 record and lead the NL Central.

However, Milwaukee has a +3 run differential for the season. That’s smaller than the +4 for the Chicago Cubs, their division rival with an unlucky 16-28 record.

Moreover, adjustments for schedule strength drop Milwaukee to 26th in The Power Rank.

To explain this adjustment, consider a mid April series against Pittsburgh, 28th of 30 MLB teams in my rankings. Milwaukee scored 5 less runs than Pittsburgh but won 3 of 4 games. As you might have guessed, they won 2 of those games by one run.

The Chicago Cubs are 19th in my rankings. All the NL Central teams are looking up at St. Louis, ranked 11th. Expect the Cardinals to take the division.

Check out the MLB rankings, updated nightly

The MLB season is young, and lots will change over the coming months.

To stay up to date with my calculations, check out the MLB rankings, which are updated each morning.

5 insights from academic research on predicting world soccer/football matches

Image from Flickr account of Antony Pranata

You want to know which country will win an international soccer match.

In particular, which rankings make the best predictions? Should you stick to the ubiquitous FIFA rankings or switch to the calculations of an upcoming number cruncher?

Recent academic research from the Netherlands sheds light on this question. Jan Lasek and coworkers looked at a variety of world rankings in soccer and asked how well they predicted the results of 979 test matches, a huge sample set.

To test the rankings, they developed a method so each rankings gave a “win probability” for a match. Then they looked at how far this probability deviated from the actual result of the match.

For example, suppose the United States is predicted to have a 54% chance to beat Mexico. If the match ends as a draw, the deviation of the prediction (0.54) from the result (0.5 for a draw) is 0.04. Taking the square of 0.04 gives a measure of the error. A win for the United States gives an error of (1.0 – 0.54) squared, while a loss results in an error of 0.54 squared.

Jan asked me to take part in the study with The Power Rank. I directly provided him with the win probability for the 979 matches in the test set.

The visual shows the results for the mean squared error. A smaller error implies a better predictor.

lasek

The horizontal bar gives a measure of the uncertainty in the error estimate. There is a 2 in 3 chance the true error is within the range of the bar.

The authors also looked at a different error measure called the binomial deviance. However, the results are similar to the mean squared error.

For the curious soccer fan, the paper draws the following conclusions.

The FIFA rankings

FIFA, the international governing body for soccer, publishes the most popular international rankings. However, it’s just a table (3 points for a win, 1 for a draw, 0 for a loss) that attempts to account for strength of opponent and importance of the match.

The FIFA rankings do poorly at predicting the outcome of matches.

What did you expect from such a simple method? They account for strength of schedule by taking the rank of an opponent and subtracting it from 200. That might have been novel in 1863.

While FIFA fails in ranking nations in men’s soccer, they do a better job for the women. The FIFA Women’s ranking uses an Elo type rating system that accounts for margin of victory. This information is critical in predicting match outcomes.

Margin of victory

The top 5 rankings for predicting matches use margin of victory in their calculations. Only one of the remaining rankings in the study (not shown in the visual) use this information.

Two of the top rankings, the FIFA women’s rankings and EloRatings.net, do not use margin of victory in any kind of sophisticated way.

For example, a typical Elo ranking uses a 1, 0.5, or 0 for a win, draw or loss in a match respectively. Instead, the FIFA women’s rankings use a number between 0 and 1 for a match outcome based on the score. These numbers, which Lasek and coworkers show in Table 2 of their paper, appears to have no mathematical justification. However, the rankings perform well in prediction.

The Least Squares rankings and The Power Rank, two methods that naturally use margin of victory, were two of the other top systems.

The Elo++ rankings show the critical importance of margin of victory. This system won a Kaggle competition for ranking chess players. It has advanced features like giving less importance to matches in the distant past and uses a sophisticated regression method in its calculation.

However, it does not account for margin of victory. While it’s performance in predicting matches isn’t as bad the FIFA rankings, it does not perform as well as the top 4 rankings.

The wisdom of crowds

The best method for predicting football matches was the Ensemble, which combined the predictions of the FIFA women’s rankings, EloRatings.net, The Power Rank and Least Squares.

The improvement from aggregation was significant. The ensemble of 4 rankings had an error 4.3% lower than the average error of the 4 systems.

Others have aggregated the wisdom of many computers, a type of ensemble learning, to make predictions. Nate Silver uses 4 different college basketball rankings in his NCAA tourney predictions. I aggregated 7 preseason baseball predictions to forecast the 2014 season.

You’ll see a lot more of this from The Power Rank heading into football season.

More games or only recent games?

The FIFA rankings use a four year window to calculate rankings. With the turnover in players and coaches on national teams, this seems like a reasonable time span over which to evaluate a team.

But maybe a team just gets lucky over that time span. Four years means less than 80 games for most countries. Maybe an underachieving country like Argentina has had bad luck in world competition recently.

When Jan Lasek asked me to be a part of his study, I did two separate calculations. For each match, I used these sets of games in predicting the outcome.

  • Every match from July 15, 2006 until the day before the match
  • Every match from January 4, 2002 through March 29, 2011 (a few days before matches in the test set)

Even though the first set contains fewer and more recent games than the second set, the two calculations had about the same predictive accuracy. The first appeared in the paper, but the second had a slightly smaller mean squared deviation.

Soccer teams don’t change much over time. Simon Kuper and Stephan Szymanski found the same result for England in the book Soccernomics. From 1980 through 2001, they found that the sequence of wins for the national team was identical to the random flipping of a coin.

Network research in rankings

Lasek and coworkers also studied the rankings from a paper by Park and Newman. They developed a ranking method based on their research in networks. The nodes in the network represent teams, and edges that connect nodes are games between the teams. The Power Rank uses the same concept.

I’m not sure why, but the Park Newman method has a cult following. Maybe it’s because the paper is available for free on an archive, or that Mark Newman has a prestigious professorship in physics at the University of Michigan. But these rankings pop up everywhere. I even get random emails asking me about it.

However, the method does not use margin of victory, and it’s terrible at predicting football matches. It performs much worse than the FIFA rankings.

Check out the best international rankings

Lasek and coworkers highlight important aspects in ranking world soccer teams. However, it’s not the last word on predicting matches.

The biggest problem with their method is using one win probability for a match. While this works for testing the predictive power of rankings, it does not get to the heart of football prediction: the probability for a win, loss and draw.

But the paper does give some simple advice for following world football. Check out EloRatings.net and The Power Rank.