The Top 10 Things to Know About The Power Rank’s Methods

A screen shot of a data file used by The Power RankWe crunch numbers to better understand sports.

And there’s a mountain of sports numbers. In 2011, college football teams in Division I played 1430 games against each other. With each game comes a final score, a box score with statistics such as fumbles lost, and possibly a play log that details each play of the game. In 2011, college basketball teams in Division I played over 5400 games. There have been over 1100 meaningful international soccer matches over the last 4 years.

Our sports analytics uses mathematics and computers to make sense out of this data. It takes this unstructured mass of numbers and outputs tidy lists that provide sports insight. Our methods have been developed specifically for sports, and they can answer questions such as whether your team will win its next game or who has the best passing offense in the country. Let’s look at 10 things you should know about how this works.

1. Adjusting for strength of schedule

Our algorithm takes sports numbers and adjusts them for strength of schedule. As a simple example of sports numbers, consider a team’s margin of victory, or points scored minus points allowed per game. It’s difficult to rank teams by this raw statistic. A good team that plays a weak schedule could have a higher margin of victory than a top team that plays a difficult schedule. Our algorithm adjusts this raw margin of victory for schedule strength. The result is a rating that gives a predicted margin of victory against an average team in the league. These ratings determine our rankings.

As an example of why strength of schedule matters, consider the Boise State college football team. They burst on the national scene with a win over Oklahoma in 2007 Fiesta Bowl. Boise State remained on the national radar despite playing in a lesser known conference with weaker opponents. Accounting for strength of schedule is the only way to compare Boise State with more well known teams like Alabama and Texas.

And our numbers like Boise State. From 2009 through 2011, the Broncos finished the season in the top 5 of our rankings.

2. Predicting a margin of victory for each game

The difference in the ratings of two teams gives a predicted margin of victory at a neutral site. For example, Michigan will beat Virginia Tech by 6.7 in the Sugar Bowl.

Clearly, this prediction is wrong, since Michigan can’t win by a fraction of a point. The prediction really means that there’s a 50% chance Michigan beats Virginia Tech by more than 6 points, while there’s a 50% chance Michigan wins by less than 7 points or loses.

This predicted point spread gives one team a higher likelihood of winning the game. The fraction of times this team actually wins gives an accuracy for the algorithm. College football bowl games are a convenient test set since these games are played at neutral locations. (Well, the Rose Bowl might be a home game for USC. At least the visiting Big Ten team has a week to acclimate to the time zone.) Over the last ten years, our method has predicted the winner in 62.4% bowl games (196 of 314 games). This accuracy is better than the Vegas line (61.7%).

Moreover, we reserve the right to update this accuracy. There are still a few tricks up our sleeve that we’re itching to test out.

3. Methods are based in statistical physics

What in the world does statistical physics have to do with sports?

Well, statistical physics studies how the interactions of molecules on the nanometer scale produce bulk behavior on the human scale. For example, the attractive forces between molecules in a liquid result in the spherical shape of a water drop. Statistical physics considers all of these interactions in describing the properties of the drop surface, such as its energy.

In sports, teams are the molecules. These teams or molecules interact by playing games. The statistical physics of our algorithm considers all interactions or games to produce team rankings, which are like the bulk properties of the water drop.

The Power Rank algorithm is based on a decade of studying statistical physics. The key connection between physics and sports came from the original paper on Google’s PageRank algorithm. It turns out that ranking websites based on the link structure of the web has everything to do with statistical physics, which we’ll explain below. More importantly, PageRank inspired a new algorithm for ranking sports teams.

Think of The Power Rank as a research institute devoted to sports. Just like academic groups that study statistical physics, we spend our days working out mathematics on paper and the writing computer code to compute answers. Well, the publishing model is a bit different…

4. Rankings for passing offense, rushing defense…

While we originally started by ranking teams, we soon realized the algorithm was applicable to more than just margin of victory. Other types of raw statistics, such as points scored by the offense, lead to rankings of scoring offense. Of course, an offense interacts with the defense from the opposing teams. Since we must consider these units in ranking scoring offense, we also get rankings for scoring defense.

It doesn’t end there. Raw statistics such as yards per pass attempt lead to rankings for pass offense and defense. The algorithm adjust these raw quantities for strength of schedule, which opens up a rich set of insights into football. For more on the recent passing and rushing numbers we crunched from the 2011 college football season, click here. We are still exploring all the different types of statistics that our algorithm can turn into a set of rankings. Pass rush versus pass protection based on sack rate might be an interesting one.

5. Win probabilities for games, tourneys and seasons

The predicted margin of victory from our algorithm can be turned into a win probability. For sports like basketball in which each games produces a winner, this method produces a probability for each team. For sports like soccer in which teams can tie, we’re still working on our methods for win, loss and tie probabilities.

These win probabilities also allow us to forecast the outcomes of tournaments and seasons. We program a computer to play the tourney or season many times, flipping a coin for every game according to our win probabilities. For example, Kentucky had a 16% chance of winning the 2012 NCAA men’s basketball tournament, while Spain had a 22% likelihood of winning Euro 2012.

6. Diminishing returns for blow outs

College football once upon a time had something called the Bowl Championship Series. This system attempted to determine the two best teams in the country and match them up for a National Championship game. Computer rating systems played a role in this determination. However, the powers that be didn’t want to encourage running up the score in games. So they didn’t allow the computers to use margin of victory in their calculations.

The silliness of this restriction is mind boggling. Any algorithm that throws out this information has a lower class of predictive ability than one that uses margin of victory. It’s like setting up a shopping comparison website that only tells you which store has the lower price.

There’s a better way: give teams less credit as their margin of victory increases. Our algorithm does this. We first noticed this feature in 2010. Wisconsin was just destroying weak Big Ten competition, beating Indiana and Northwestern by 63 and 47 points respectively. They barely moved in the rankings, ending the season at 10th. This damping of outliers also helps in ranking passing and rushing. Oregon doesn’t get too much benefit from rushing for almost 10 yards per carry against Missouri State in 2011.

We got a bit lucky with this feature, as it wasn’t something designed into the algorithm. It’s simply the result of following some guidelines from statistical physics in setting up the equations. We’ll take it.

7. Home field advantage

It exists. In Scorecasting, Tobias Moskovitz and Jon Wertheim showed that referee bias played a role in home advantage. Fatigue also matters, a point we would like to put some numbers behind in the near future.

No matter why it comes about, home advantage is too large a factor to ignore. Home teams in college football won 6.2 points on average in 2011. However, that number is skewed since good teams schedule poor teams at home in out of conference games. It’s the cupcake phenomena. When one only accounts for conference games, home advantage in college football is closer to 3.0. When making predictions, it’s important to add this factor to the home team.

8. How the fickle sports fan explains our rankings

To get an intuitive feel for how the algorithm works, consider Fickle Freddy. He grew up in Philadelphia following the Phillies but has become unpredictable in his old age. One moment, he cheers wildly for the Phillies, screaming at the television as if Joe Carter and 1993 World Series against the Blue Jays had never happened. All of a sudden, he sees the Phils get stomped by the Mets in a three game series by a collective 16-0 score and switches allegiance to the Mets. This is quite shocking, as Freddy grew up hating the Mets. But then the Diamondbacks sweep the Mets and all of sudden he’s an Arizona fan.

Over and over, streching into infinity, Freddy jumps from team to team. The bigger the margin that the Mets beat the Phillies, the more likely he makes the transition from Mets to Phillies. However, any jump is possible, so he even spends time rooting for the Pittsburgh Pirates.

The amount of time Fickle Freddy spends with each teams determines its rank and value. As the quintessential fair weather fan, he spends more time with the better teams. Fickle Freddy acts much like the random web surfer that Larry Page and Sergey Brin used to describe their PageRank algorithm for ranking websites.

9. No unknown parameters

There are only two parameters in the algorithm: home advantage and a second that governs the diminishing returns discussed earlier. Both parameters are determined from the data. In no way do we change weights or fudge parameters based on the results. This isn’t climate modeling. I won’t name names, but those researchers get defensive when you ask about the number of unknown parameters in their model.

10. Solving a system of equations

Rice and Houston score a total of 84 points in a college football game. Rice outscores Houston by 16. What was the final score of the game?

Let’s transform this problem into math by letting X denote Rice’s score and Y Houston’s score. Then we want to solve

  • X + Y = 84
  • X – Y = 16

This is a system of equations in the 2 variables X and Y. If we want to solve it, we first solve for X in the second equation, giving X = 16 + Y. Plugging this into the first equation, we get 2Y + 16 = 84. By solving this one equation in one variable, we get Y = 34. Then X = Y + 16 = 50. Rice won the game 50-34.

This simple example shows how one solves a system of equations in 2 variables. In calculating our rankings for college football, we solve a linear system of equations with 246 variables.

The most important aspect of this solution is that we solve for all the variables simultaneously. In our example, 2 variables satisfy the 2 equations. All good ranking systems have this property. When you’re hunting around for other sports analytics sites, beware those that don’t solve a system of equations. Often times, they’ll talk of iteration. Make a guess at the answer, see if it satisfies the equations, change the answer to see if it gets closer to satisfying the equations. Nate Silver talks about iteration in his Soccer Power Index, and he’s most likely solving a system of equations similar to the ones above.

However, sometimes people talk about adjusting for strength of schedule to some order. Aaron Schatz spilled a bit of the beans on Football Outsider’s DVOA metric in this post, saying he only makes second order adjustments when accounting for schedule strength. That’s like iterating twice. It’s not solving a set of equations.

Are you still reading?

Wow, thanks. My name is Ed Feng, and if we ever meet in a bar, the first beer is on me.