What can analytics tell us about college football? Can numbers help us determine the four best teams for the new playoff?
College football seems behind other sports in the analytics revolution. Bill James started using baseball data to answer the sport’s important questions in the late 70’s. His methods hit mainstream in 2003 when Michael Lewis published Moneyball, a book on how the Oakland A’s used analytics to compete with higher payroll teams.
Now, baseball will soon track the motion of every player on every play of every game. The NBA has also evolved beyond box score statistics like points per possession and installed player tracking cameras in every arena.
However, we’re not completely in the dark with college football analytics. Here, we discuss the most important concepts you should take away from college football analytics. Keep this primer handy as you argue which teams should make the playoff.
Strength of Schedule and Margin of Victory
With the upcoming playoff, strength of schedule has become a buzzword in college football because it’s a key criteria for the selection committee that will pick the best four teams. It’s an immense improvement from the BCS era in which a team’s record played a much greater role in making the championship game.
However, a less discussed but also important concept is margin of victory. The data shows that teams with a larger average margin of victory tend to win more. This holds in every sport from college football to international soccer.
The BCS made margin of victory a political issue. Since they didn’t want teams to run up the score, they banned margin of victory from the computer polls in their formulas. This resulted in the insanity of using Jeff Sagarin’s Elo rankings, a model with less predictive power than his predictor rankings. (With the end of the BCS, Sagarin now uses margin of victory in his Elo rankings.)
Let’s use data to determine the importance of strength of schedule and margin of victory in predicting football games. We will calculate a set of rankings based on games prior to bowl season and ask how often the higher ranked team won a bowl game. The test set includes 339 bowl games from the 2005 through 2014 seasons. The visual contains results for a number of rankings.
The rankings by win percentage considers neither strength of schedule nor margin of victory. The team with the better record by win percentage is predicted to win a bowl game. It doesn’t matter that Rice’s 10 wins came in Conference USA while Mississippi State won 6 games in the SEC.
The Colley matrix, one of the computer polls of the deceased BCS, adjusts wins and losses for strength of schedule. This method has a weird quirk but is otherwise based on the sound mathematics of linear algebra.
Raw margin of victory is calculated by taking points scored minus points allowed and dividing by the number of games, an extraordinarily simple metric.
The last two rankings depend on algorithms that take raw margin of victory and adjust for strength of schedule.
The Simple Rating System (SRS) appears on all the Sports Reference sites. If you dig into the math, you’ll find that SRS is a least squares rating system. Just compare the matrix equation on page 38 of Ken Massey’s thesis with the equations in this article on Sports Reference. We’ll encounter this method again later in discussing efficiency metrics.
I developed The Power Rank from my research in statistical physics. While the details are not public, the algorithm is based on the mathematics of Markov chains and tends to strongly diminish the effect of blowouts.
The visual shows the importance of margin of victory. While Colley’s method increases the predictive performance of a team’s record, it can’t match the predictive power of raw margin of victory. The BCS brought knives to a gun fight. With a 339 game sample, the uncertainty in these win percentages is about 3%, which shows the significant improvement of raw margin of victory over Colley.
The strength of schedule adjustments from the Simple Rating System and The Power Rank improve the prediction accuracy of raw margin of victory.
Randomness of turnovers
Turnovers have an enormous impact in football. A lost fumble at the goal line negates a scoring opportunity, or a tipped pass falls into the hands of a defender who scores a touchdown. It’s enough to make any fan go looking for Prozac.
It’s easy to assign blame or credit for these plays. The running back didn’t hold on to the ball, or the linebacker smacked the crown of his helmet right onto the ball, jarring it loose. Newton’s second law supports these arguments.
However, data tells a different story about turnovers. First, consider fumbles. Can a defense force fumbles, meaning both knocking the ball loose and recovering it? If they can, then teams that force more fumbles in the first six games of the season should also force more fumbles the next six games.
In 2013, nine teams forced nine or more fumbles in their first six games. These defenses (and possibly special teams) averaged 1.77 fumbles per game in this early part of the season. In their next six games, these teams averaged 0.89 forced fumbles per game, close to the 0.78 average. This doesn’t support the idea of a big play defense.
How about holding onto the ball, either by not fumbling or recovering any such fumbles on offense? That must be a skill, right? In 2013, eight teams had one or fewer fumbles lost in their first six games. In their next six games, these same teams lost 0.63 fumbles per game, much closer to the 0.78 average.
Fumble rates on offense and defense strongly regress to the mean from early to late season. For the 246 FBS and FCS teams that played at least 9 games in 2013, the forced fumble rate for a defense in the first 6 games of the season had no correlation with forced fumble rate later in the season (explains less than 1% of the variance). Bill Barnwell found the same result in the NFL. This lack of correlation also holds for lost fumbles on offense.
Fumbles rates do depend on where the fumble occurs on the field. Brett Thiessen, who writes as the Mathalete on MGoBlog, found that a defense forces fumbles on almost 6% of sacks, a much higher rate than any other kind of play. While the defense doesn’t recover as many of these sack fumbles as those on positive plays, sacks still have the highest net rate of recovered fumbles by the defense.
How about interceptions? For a college football defense, their interception rate (interceptions divided by pass attempts) in the first 6 games explains 1.3% of the interception rate the rest of the season. As the visual shows, you’re essentially looking at randomness when your defense picks off the other team.
On offense, interception rate still shows the same randomness from early to late season. However, offenses that complete a higher number of passes tend to throw few interceptions. In 2013, a team’s completion percentage explains 28.5% of the variance in interception rate. This correlation gets stronger when looking at the statistics for individual quarterbacks in the NFL.
I understand the difficulty in accepting the randomness of turnovers. Your eyes see the linebacker put his helmet on the ball while making a hit, which jars the ball loose from the running back. However, the data implies that the linebacker cannot consistently hit the back with enough force to cause the fumble. Don’t let your eyes deceive you.
Turnovers can have a large impact on margin of victory. A tipped pass near the goal line that the defense returns for a touchdown could be a 14 point swing. Since turnovers introduce randomness into the margin of victory that most computer rankings use, we need other metrics to evaluate a team. The next section discusses these efficiency metrics.
Efficiency metrics
To define an efficiency metric in football, you take a meaningful number like yards gained by an offense and divide it by an appropriate quantity. Unfortunately, most talking heads pick number of games as the denominator and use yards per game to rank offenses.
Yards per game statistics have serious problems in evaluating offense and defense. College football teams play at variety of paces, from the 87.3 plays per game that Texas Tech ran in 2013 to the 59.3 of South Florida. Moreover, yards per game is a particularly bad metric for the defense of an uptempo offense like Oregon or Oklahoma State. These defenses face a large number of plays since their offenses play so fast.
Tempo matters more in the college game than the NFL. In 2013, the standard deviation in plays per game was 5.7 in college, so 2 out of 3 teams averaged within 5.7 plays of 70 play average. The NFL had a standard deviation of 3.2 plays per game.
More problems with yards per game appear when evaluating passing and rushing. Teams tend to run the ball when they’re ahead since these plays keep the clock running. Hence, good teams tend to have better rush yards per game due to play selection.
To get better insight, the football analytics community has developed a number of efficiency metrics for offense and defense.
Yards per play
The most simple metric is yards per play. Take total yards and divide by the number of plays. It does not account for the situation of any play like some of the metrics discussed later. However, yards per play is mostly immune from the randomness of turnovers, one reason it has great predictive power.
Bob Stoll, a college football handicapper that has a 54.9% win rate against the spread from 2001 through 2013, uses yards per play as his primary metric in evaluating offense and defense. You can read more about his methods in this essay and his free analysis of games every week.
Yards per play gets tricky in college football when breaking an offense or defense into passing and rushing. In college football, negative yards from sacks count against rushing totals. However, sacks started as an attempt to the throw the ball and should count against pass totals.
No matter, every major media site counts sacks as rushes in both yards per game and per play statistics. To see yards per play with sacks as pass plays, check out my numbers at The Power Rank.
The NFL does not count sacks as rush plays. However, sacks do not count as pass attempts either. Then the total number of plays includes passes, rushes and sacks. It’s important to include sacks in total plays to calculate the correct yards per play.
At The Power Rank, I take yards per play and adjust for strength of schedule with my ranking algorithm I developed from my Ph.D. research in statistical physics. These efficiency rankings for both college football and the NFL are available to members. Bob Stoll also adjusts for schedule strength in his yards per play numbers.
Expected points added
Suppose a team has a 1st and 10 at their own 20 yard line. They could drive the length of the field for a touchdown, gaining +7 points for the offense. Their drive could also stall at the opponent’s 17 yard line, which results in a field goal for +3 points. In the worst case, a tipped pass could fall into the hands of a cornerback who scores a touchdown, netting -7 points for the offense on the next score of the game.
Given a down, distance and field position, the offense’s expected points is an average of the net points of the next score, a calculation which requires historical play by play data. Brian Burke of Advanced Football Analytics has performed this calculation for the NFL and found that a 1st and 10 from a team’s 20 yard line gives +0.3 expected points.
With this baseline knowledge, the expected points added (EPA) is the points gained or lost from a play. For example, suppose the offense gains 20 yards from that 1st and 10 from their own 20 yard line. Burke calculates 1.3 expected points for a 1st and 10 from their own 40. Since the offense started in a situation with +0.3 expected points, they had +1.0 EPA for this play.
This metric accounts for the situation of a play. There’s more value in gaining 2 yards on 3rd and 1 than gaining 2 yards on 1st and 10.
EPA forms the basis of ESPN’s Football Power Index (FPI) for college football. They use the Simple Rating System, the least squares ranking system discussed previously, to adjust this statistic for strength of schedule. Bill Connelly of SB Nation also uses this concept in his Equivalent Points Per Play, a component of his S&P ratings for college football. For the NFL, Burke uses EPA to evaluate players.
Success rate
Success rate is the number of successful plays divided by the total number of plays. In college football, Bill Connelly defines success as 50% of the necessary yards on 1st down and 70% on 2nd down. Success requires all the necessary yards on 3rd and 4th down.
Connelly’s S&P metric multiples his success rate with the Equivalent Points per Play mentioned earlier. He then adjusts for strength of schedule by looking both at a team’s opponents and opponents of opponents. This results in S&P+, which appears on Football Outsiders.
This idea of success rate also forms the basis of Football Outsider’s DVOA (Defense adjusted value over average) for the NFL. Aaron Schatz and coworkers expanded success beyond assigning a 0 or 1 to each play. They give a real value such as 1.3 or -4.0 to each play based on down and distance. This leads to their team rankings as well as player metrics.
The Football Outsiders also talks about large negative values of DVOA on turnovers. Hence, this success rate statistic is affected by the randomness of turnovers just like margin of victory. Turnovers impact Connelly’s success rate much less since it only counts as a failure.
Points per drive
Instead of looking at football on each play, it also makes sense to evaluate offense and defense on each drive. Brian Fremeau does this with his Fremeau Efficiency Index. It compares the points earned on a drive with the expected number of points based on starting field position.
Accounting for starting field position is important. For example, if the offense gets the ball only a yard from the end zone, they should not get full credit for scoring the touchdown. Instead, the offense get 7 minus the expected 6.4 points teams usually score from the opponent’s one yard line.
Fremeau publishes his drive based numbers both on his own site and Football Outsiders. The latter site also combines FEI with S&P+ to obtain the F/+ rankings, an aggregate picture of team, offense and defense in college football.
Football Outsiders also publishes drive stats for the NFL. Unlike FEI, these stats neither consider starting field position nor adjust for schedule strength.
Will the selection committee use analytics?
No one knows whether the 13 people responsible for picking the 4 teams for the college football playoff will care about numbers. However, it’s encouraging that the selection committee has publicly talked about strength of schedule. That’s a huge step.
However, they need to apply it correctly. They could also consider strength of schedule by wins and losses and not margin of victory. In 2012, Notre Dame went 12-0 with 3 wins over top 25 teams. A ranking system like the Colley Matrix, which adjusts wins and losses for strength of schedule, ranked the Fighting Irish first heading into Bowl season.
But this article shows the importance of margin of victory in rating teams. Notre Dame had an average margin of victory of 16.4 points compared to the 27.8 of Alabama, the team they faced in the BCS title game. Alabama stomped Notre Dame 41-14 in that game. The Colley matrix had Notre Dame ranked over Alabama after the title game, showing the fallacy in systems that don’t use margin of victory.
There’s even less hope the committee grasps the randomness of turnovers. They will be watching games, but turnovers are an area in which your eyes will fool you. Forcing a fumble by a big hit looks impressive and can change a game. However, high forced turnover rates are not sustainable.
Just ask Oregon about the randomness of turnovers. In 2013, they forced 9 fumbles in their first 6 games as they raced towards a spot in the BCS title game. Oregon only forced 2 fumbles in their next 6 games. Losses to Stanford and Arizona killed their championship dreams.
And there’s no chance the committee gets into efficiency metrics. The college football establishment is just not sophisticated enough for that. Maybe in a few years.
But we can all hope they grasp the basics of using margin of victory in their deliberations. It’s the only way to determine the 4 best teams, the goal of the selection committee.
The concept of margin of victory has obvious merit. In football, it would be really meaningful if everyone played the same list of opponents.
Or it would still have some usefulness if everyone played a schedule of typically skilled opponents.
But many highly-rated BCS teams start off their schedules with two doormat opponents. .and many also schedule a third “punching bag” just before a big rivalry game or a hoped-for conference championship appearance.
So up to 3 wins on a 12 game schedule contribute flattering, biased data.
And any relevance of margin of victory is destroyed.
“Relevant” and “perfect” are not synonymous.