To play with our interactive bracket, click here.
Who will win Euro 2012?
The talking heads on sports media will certainly share their opinion based on gut instinct. But wouldn’t it be cool to have a card counter’s edge in knowing who will win? An unbiased estimate of the win probability for all 16 teams?
At The Power Rank, we crunch numbers to predict the outcome of tournaments. It begins with our algorithm for ranking teams, which has been more accurate than Vegas at predicting winners of college football bowl games. Then, we use these rankings to simulate Euro 2012 half a million times. Just like the mathematicians who used simulations to beat black jack, we analyze these results to provide a fresh perspective. Perhaps it’s a new view to share with your soccer fanatic friends.
If Poisson random variables means something to you, there’s a description of our methods at the bottom of this post. For everyone else, click here to see all of our predictions. Here, we’ll supplement the bracket with analysis of some important questions.
Who will win Euro 2012?
Spain, the Netherlands and Germany are the three best teams in this tourney. Not many will argue with that. As one can see from hovering over the right most circle in the interactive bracket, these teams have the highest likelihoods of winning the tourney. However, Spain has a much easier route. The Netherlands and Germany occupy the Group of Death with Portugal and Denmark, two other top 12 teams in our world rankings. Spain has a 23.1% chance of winning, a significantly higher likelihood than the Netherlands and Germany at 15.2% and 13.6% respectively.
Still, a 23% win probability is not that large. If each team had an equal likelihood of winning Euro 2012, the win probability would be 6.25%. Spain is 3.7 times more likely to win the tourney. In this year’s NCAA men’s basketball tournament, the University of Kentucky had a 16% chance of winning. This is 10.9 times greater than the 1.5% probability assuming all 68 teams are equal. The European Football Championships are insanely competitive.
Who will survive the Group of Death?
The Netherlands (3), Germany (4), Portugal (7) and Denmark (12) are all in the same group. Just brutal. In comparison, Group A has Russia (20), Czech Republic (24), Greece (42) and Poland (44). This makes it the Group of Eternal Life. While the Netherlands and Germany have about a 60% chance of making it to the knock out stage, Denmark still has a 38% chance.
The strength of this group is even more apparent from the tournament win probabilities. Denmark and Portugal both have about a 5% chance to win Euro 2012, a higher likelihood than any team from the Group of Eternal Life.
How will the host countries Poland and Ukraine do?
These two host countries present a problem for our rankings since they did not have to qualify for Euro 2012. We only use meaningful games such as tournaments and qualifying games to rank teams. This leaves us with only 10 and 12 games to evaluate Poland and Ukraine respectively since 2009. These games were from a failed World Cup qualifying campaign for both countries.
To see whether these countries had performed better lately, we considered friendlies involving Poland and Ukraine since the last World Cup. We added these matches to the set of matches used in our rankings. While the team ranking for the Ukraine barely changed from 31st to 32nd, Poland shot up from 96th to 44th. They’re still essentially the worst team in the tourney, but they’re no longer the 2012 Charlotte Bobcats of the NBA. Poland has clearly performed much better since their failed World Cup qualifying campaign, earning ties against full strength squads from Germany and Mexico.
There are many problems with using friendlies in our rankings. As readers of this site have pointed out, the rules are different (6 substitutions instead of 3), and teams treat these games as exhibitions to try new strategies. However, Poland and Ukraine probably treated these games more seriously since they didn’t have to qualify for Euro 2012. Moreover, the jump in Poland’s ranking is too large to ignore. We decided to use these additional games in our Euro 2012 simulations. With a 0.41 goal advantage as a host country, Poland and Ukraine have a 56% and 53% probability respectively to advance past the group stage.
Note: We have not used the friendlies for Poland and Ukraine in the primary world rankings. Sorry for the confusion, as the rankings on the interactive bracket do not match these rankings.
How did we perform these simulations?
The win probabilities in the interactive bracket are based on The Power Rank algorithm, a method that accounts for strength of schedule and margin of victory in ranking teams. While our team rankings show team strength for 119 countries, we take a different approach to generate win probabilities for Euro 2012. Instead of applying the algorithm to the entire team, we apply it to the offensive and defense separately. This leads to a goal rate for one team’s offense against another’s defense.
From here, we simulate the score of each game in the group round. The score comes from two uncorrelated Poisson random variables based on the goal rates. Moreover, the goals scored and allowed for each team lets us account for tie breakers in each simulation. The Poisson model also applies in the knock out stage. In the unfortunate case a game ends tied after 120 minutes, we pick randomly to determine a winner. Unfortunately, we don’t have the data or analytics to make a better guess about the outcome of a penalty kicks.
What do you think?
Is Poland really that good? Will you completely ignore these predictions because it includes friendlies for Poland and Ukraine? Please leave us a comment.
Thanks for reading.
Ed,
No comments about your Euro Soccer power rankings, but how would you normalize the following ? Suppose I had historical Indianapolis 500 driver data for the past 25 years or so, and wanted to conform strength of competition in an average Indy 500 race to strength in a theoretical one with the top drivers of a few decades. How would “percentage finishing out of the top 13”, for example in a driver’s career be adjusted to probability of “finishing out of the top 13” in a theoretical field of 33 top drivers, mostly winners ? What happens to the probability when expanding to 40 top dirvers ? Assume all historical finish results were available for all drivers.
Nice work on the power rankings, which looks like conditional probability to me.
Yes, rankings certainly use conditional probability, but in the context of Markov chains.
Not sure I can answer your question, but I do find it very interesting. All my methods are built on margin of victory, but the time margin of victory in Indy doesn’t mean much, right? Only the places matter. But you have all this robust data from every race, and each race gives a head to head between each set of drivers. So a lot of ranking methods that use “games” should also apply there. Would be very interesting to see.
Thanks Dr. Ed.
A recent academic paper on the web finds that average finish position of recent races is about 0.5 correlated with winning a NASCAR race. Other data, such as laps led, are correlated around 0.4. It would seem, then, that average finish and historical wins would be important contributors to a “power probability” of winning for an historical Indianapolis driver. Finish results could very well be a bi-modal distribution, where, if the driver does not crash, he finishes often in the top 10, If he does crash, he will surely be in the bottom 12. AJ Foyt era drivers had DNF 50% of the time, while Helio era drivers finish 80% of the time.
So, it seem that some Monte Carlo simulation with random number generation based on Average Finish and a standard deviation, but then scaled down to produce accurate per cent Top 10 finishes, might be most accurate. The scale factor might address Top 10 percentage, while a standard deviation might replicate a normalized winning percentage. Wouldn’t data like “average finish 7.5, standard deviation 7.5” represent a Poisson distribution?
The probability of winning an Indy 500 for a Danical Patrick, who has high average finish and low standard deviation, but no historical Indy wins, is another dilemma.
This is an interesting games problem, probably requiring new mathematical approaches.