Ensemble predictions for college football, week 6, 2014

Over the last year, I’ve started aggregating many predictions to a single prediction. Research in diverse areas shows that this ensemble of predictions gives better predictions.

As the college football season continues, I’ve been working on ensemble game predictions for members of The Power Rank. These predictions aggregate not only my calculations but also other trusted sources. Some of these predictions use margin of victory while others use statistics such as yards per play and yards per pass attempt.

The predictions below also included totals (total points scored in the game). These calculations are a collaboration with Mike Craig, my partner in the college football prediction service.

Here are 3 interesting predictions on a terrific slate of games this Saturday.

Stanford at Notre Dame

Stanford will win by 4.1 points. Stanford and Notre Dame will score 43 points.

Stanford has struggled with mistakes in their two biggest games. They couldn’t punch the ball in the endzone against USC, losing by 3 on a 53 yard field goal. Washington returned a Stanford fumble for a touchdown last weekend, even though Stanford survived for the win in Seattle.

These mistakes have little affect yards per play statistics. Hence, Stanford looks better by these numbers. For example, yards per play predicts a 8 point road win over Notre Dame. This visual shows how Stanford matches up with Notre Dame by yards per play adjusted for schedule.

(In the visual, better defenses appear further to the right. This facilitates comparisons, as the unit further to the right is predicted to have an advantage.)

Screen shot 2014-10-03 at 2.35.32 PM

The ensemble likes Stanford to beat Notre Dame. No, there’s no personal bias in these numbers from this Stanford alum.

Nebraska at Michigan State

Michigan State will win by 4.1 points. Nebraska and Michigan State will score 57.5 points.

This Big Ten showdown features Nebraska’s 5th ranked offense against Michigan State’s usually strong defense.

Screen shot 2014-10-03 at 2.31.01 PM

The yards per play numbers in the visual use data from last season since rankings with only data from this season are volatile. For example, Michigan State had the 28th ranked defense yesterday, a low but potentially believable ranking for a defense that was elite last season.

Then Oregon’s offense has a terrible game against Arizona’s defense last night. Since Oregon played Michigan State earlier this season, Michigan State’s defense drops to 57th in rankings that only use this year’s data. With some input from last year’s games, a ranking of 26th seems more reasonable.

Ohio State at Maryland

Ohio State will win by 0.2 points (a 50-50 game). Ohio State and Maryland will score 59 points.

The predictions are all over the map for this game. Maryland looks the equal of Ohio State by yards per play. Both offenses have a slight edge, as shown in this visual.

Screen shot 2014-10-03 at 2.33.38 PM

Yards per play favors Maryland by 3 points. My model gives the home team 3 points, so this prediction says Ohio State and Maryland are equal teams.

However, the markets favor Ohio State by 8.5 points. Some of this advantage probably comes from the history and tradition of Ohio State. However, this spread also considers injuries. Maryland QB C.J. Brown is listed as questionable, and Maryland has suffered a rash of other injuries on both sides of the ball.

Become a member of The Power Rank

Members have access to all the ensemble predictions as well as interactive versions of these match up visuals. To learn more about my methods, sign up for my free email newsletter. Enter your email and click on “Sign up here.”

Division series win probabilities for the 2014 MLB playoffs

My team rankings and adjustments for starting pitcher give these numbers.

  • Washington 61.1% over San Francisco
  • Los Angeles Dodgers 73.3% over St. Louis
  • Los Angeles Angels 66.2% over Kansas City
  • Detroit 59.7% over Baltimore

These win probabilities start with my MLB team rankings, which take run differential and adjust for strength of schedule. Also, for the first time, I adjust for cluster luck based on the regular season.

In addition, the projections consider starting pitching through xFIP, an ERA type statistics that captures the skill of a pitcher through strike outs, walks and fly ball rate.

Daily predictions for each game appear on the predictions page.

Assumptions behind the calculations

Michael Wacha is not projected to start for St. Louis, which leaves John Lackey and Shelby Miller to start game 3 and 4.

I’m assuming Gio Gonzalez and Yusmeiro Petit start for Washington and San Francisco respectively in game 4.

Note that Baltimore is the only team with home advantage that doesn’t have the higher odds to win the 5 games series.

Cluster luck numbers for the 2014 MLB regular season

To explain cluster luck, my Grantland colleague Jonah Keri wrote the following.

Joe Peta, a former Wall Street trader, presented cluster luck in his book, Trading Bases. Essentially, the concept boils down to this: When a team’s batters cluster hits together to score more runs and a team’s pitchers spread hits apart to allow fewer runs, that’s cluster luck. Say a team tallies nine singles in one game. If all of those singles occur in the same inning, the team would likely score seven runs; if each single occurs in a different inning, however, it’d likely mean a shutout.

Here are the numbers for cluster luck for the 2014 regular season. For each team, it shows total (offense, defense) for cluster luck. In all cases, a positive number implies good luck, or scoring more runs than expected on offense or allowing fewer runs on defense.

1. New York Mets, 57.29. (9.00, 48.29).
2. Seattle, 48.63. (27.67, 20.97).
3. Cincinnati, 41.94. (12.04, 29.90).
4. Baltimore, 40.00. (-13.53, 53.54).
5. Oakland, 39.71. (44.23, -4.51).
6. Kansas City, 31.46. (16.84, 14.61).
7. Texas, 14.93. (7.64, 7.30).
8. San Diego, 13.98. (-4.81, 18.78).
9. Los Angeles Angels, 11.01. (44.40, -33.39).
10. Minnesota, 9.83. (14.90, -5.08).
11. Toronto, 8.99. (-12.04, 21.04).
12. Washington, 5.12. (-9.92, 15.04).
13. Philadelphia, 4.87. (7.44, -2.58).
14. Atlanta, 4.47. (-29.24, 33.71).
15. Miami, 0.88. (-17.45, 18.33).
16. Boston, -0.39. (-13.36, 12.97).
17. St. Louis, -0.59. (-9.22, 8.63).
18. San Francisco, -1.06. (7.16, -8.22).
19. Milwaukee, -1.29. (-9.09, 7.80).
20. Cleveland, -6.12. (-17.96, 11.84).
21. Detroit, -7.78. (-14.46, 6.68).
22. New York Yankees, -10.36. (-5.51, -4.86).
23. Los Angeles Dodgers, -14.87. (-19.74, 4.87).
24. Arizona, -15.48. (-5.73, -9.75).
25. Colorado, -28.07. (-29.56, 1.49).
26. Chicago White Sox, -28.85. (-10.81, -18.04).
27. Pittsburgh, -40.53. (-42.80, 2.26).
28. Houston, -44.07. (-19.16, -24.90).
29. Tampa Bay, -48.93. (-30.20, -18.73).
30. Chicago Cubs, -67.81. (-20.64, -47.17).

Cluster luck is the deviation of actual runs from Base Runs, the runs created formula of Dave Smyth. The difference in runs scored and runs allowed by Base Runs provides a way to rank teams. The results below give run differential (runs scored, runs allowed). The record denotes a Pythagorean expectation with an exponent of 1.83.

1. Los Angeles Angels, 131.99. (728.60, 596.61). Record: 95-67.
2. Washington, 125.88. (695.92, 570.04). Record: 95-67.
3. Oakland, 117.29. (684.77, 567.49). Record: 94-68.
4. Los Angeles Dodgers, 115.87. (737.74, 621.87). Record: 93-69.
5. Pittsburgh, 91.53. (724.80, 633.26). Record: 90-71.
6. Baltimore, 72.00. (718.53, 646.54). Record: 88-74.
7. Detroit, 59.78. (771.46, 711.68). Record: 86-76.
8. San Francisco, 52.06. (657.84, 605.78). Record: 86-75.
9. Tampa Bay, 35.93. (642.20, 606.27). Record: 85-77.
10. Seattle, 31.37. (606.33, 574.97). Record: 84-78.
11. Toronto, 28.01. (735.04, 707.04). Record: 83-79.
12. Cleveland, 22.12. (686.96, 664.84). Record: 82-79.
13. St. Louis, 16.59. (628.22, 611.63). Record: 82-80.
14. Kansas City, -4.46. (634.16, 638.61). Record: 79-82.
15. Milwaukee, -5.71. (659.09, 664.80). Record: 80-82.
16. New York Yankees, -20.64. (638.51, 659.14). Record: 78-84.
17. Chicago Cubs, -25.19. (634.64, 659.83). Record: 77-84.
18. Atlanta, -28.47. (602.24, 630.71). Record: 77-85.
19. Miami, -29.88. (662.45, 692.33). Record: 77-85.
20. Colorado, -34.93. (784.56, 819.49). Record: 77-84.
21. New York Mets, -46.29. (620.00, 666.29). Record: 75-87.
22. Houston, -49.93. (648.16, 698.10). Record: 75-87.
23. San Diego, -55.98. (539.81, 595.78). Record: 73-89.
24. Cincinnati, -58.94. (582.96, 641.90). Record: 73-89.
25. Chicago White Sox, -69.15. (670.81, 739.96). Record: 73-89.
26. Minnesota, -71.83. (700.10, 771.92). Record: 73-89.
27. Philadelphia, -72.87. (611.56, 684.42). Record: 72-90.
28. Boston, -80.61. (647.36, 727.97). Record: 72-90.
29. Arizona, -111.52. (620.73, 732.25). Record: 68-94.
30. Texas, -150.93. (629.36, 780.30). Record: 65-97.

For my analysis of how cluster luck will affect certain teams in the playoffs, check out my article on bettingexpert.com.

Check out this must read football analytics article

Football's corner 3What is the most efficient play in football? What is the analogue of basketball’s corner 3 point shot?

This question bothered Robert Mays of Grantland, so he enlisted the help of the quants at ESPN. They found ample evidence that the play action pass is the most efficient play.

To show this, they looked at expected points. Given a down, distance to a first down and field position, expected points is the average net points of the next score.

From 1st and 10 from their own 20, the offense might score a touchdown for +7 points. The offense might also punt, which leads an opponent field goal and -3 points. Expected points averages these outcomes to assign each situation a point value.

Expected points added (EPA) is the change in expected points on a given play. This statistic acknowledges that 2 yards on 3rd and 1 is worth more than 2 yards on 1st and 10.

Mays and the ESPN quants found that the play action pass earned the highest EPA of all plays. And it wasn’t even close. Running plays lost expected points on average (-0.04 EPA), while passes averaged +0.04 EPA. The play action gained +0.17 on average, 4 times more than the typical pass.

Deception matters in football. On a play action pass, the offense fakes a run, which freezes the linebackers. This frees up space down the field for a big pass play.

In college football, I’ve also found that offenses that run the ball well like Auburn in 2013 can throw effectively on 1st and 10. The defense presumably expects a run, which makes it easier to throw for a big gain.

Moreover, the data suggests that an NFL team doesn’t need a good run game to be effective with play action. For example, Minnesota had a strong rush attack with Adrian Peterson. However, the Vikings were only 21st in play action EPA over the last 4 years.

Play action passes are pass plays, and a team needs a good quarterback to make those throws. The top teams in play action efficiency have quarterbacks like Aaron Rodgers and Peyton Manning.

To check the article by Mays on football’s corner 3, click here.

The top 5 killer articles on football analytics

helmet_cover_391_289Do you want to get up to speed on football analytics? I’ve compiled 5 of my favorite articles in this free report.

To download this pdf, just sign up for my free email newsletter. (I promise, no spam. Just good content from yours truly.) Enter your email and click on “Sign up now.”

How to use baseball analytics for a profitable sports investment

true_oddsDo you bet on baseball? Are you looking for an extra edge based on data and analytics?

Onside Sports has new solution. While they launched as a social sports app last year, they have now developed True Odds, a data driven prediction system for baseball. True Odds, an in-app purchase, has a 298-271-10 record this season through September 9th.

I had the opportunity to talk with Kai Yu, the brains behind True Odds. While he obviously could not tell me everything about his methods, he did share quite a bit, which I’ll share in this post.

If you’re eager to get a free trial of their picks, click here and use the code THEPOWERRANK.

Baseball from its fundamental interaction

True Odds starts with the matchup between pitcher and batter. Based on historical data, it seeks to estimate probability of an event such as Miguel Cabrera’s hitting a home run off James Shields.

As part of this analysis, Kai had to carefully sort out which variables predict the future and which variables tend towards randomness. He noted contact rate as an import skill for a hitter. It’s tough to strike out Victor Martinez no matter who pitches to him.

This bottom up approach has advantages over the top down approach that looks at overall team performance. Often times, this top down approaches looks at a team’s runs scored and allowed. However, these numbers can be greatly affect by the sequencing of hits, or cluster luck. Combining pitcher batter matchups with the simulation method below does not have these problems.

Random simulations

Based on the probabilities from every pitcher batter matchup, True Odds uses a random simulation to play the game many times. Each simulation is different, and a set of simulations gives the probability that certain events happen, such as a Detroit win over San Francisco or a total of more than 7 runs for Oakland and Seattle.

To accurately simulate a game, True Odds must know both the pitcher and the opposing line up. This method naturally accounts for injuries.

Other quants have also used pitcher batter matchups and random simulations to profit on baseball. For example, check out this excellent Q&A with David Frohardt-Lane on Regressing, Deadspin’s sports data blog.

A multitude of other factors

Kai also stressed the importance of other factors, such as park, weather and umpires. True Odds incorporates these factors in predicting the outcomes of games.

Let’s discuss umpires, who can impact home field advantage. As Jon Wertheim and Tobias Moskovitz discussed in their book Scorecasting, umpires tend to call more strikes on road than home batters. This tendency increases in high leverage situations, such as two outs with the bases loaded in a close game in the bottom of the ninth.

However, umpires might not play as big a role in home advantage anymore. Through September 8th, home teams have scored a mere 49 more runs than road teams. This 0.02 runs per game is much lower than the historical average.

Major League Baseball might be keeping a more watchful eye on umpires with cameras. I bet True Odds has a grasp on this.

Does FIP apply to every pitcher?

The most interesting part of my conversation with Kai concerned whether fielding independent pitching applied to every pitcher.

To recap, fielding independent pitching comes from the research of Voros McCracken, who discovered that pitchers do not affect batting average on balls in play (BABIP). Pitchers have control over their strike outs, walks and home runs allowed. However, 30% of balls hit in play become hits, and deviations from this average for a pitcher strongly regress to the mean.

This research led to the development of FIP, a runs allowed statistic that only considers strike outs, walks and home runs. It should replace ERA in any discussion of pitching performance.

However, Kai suggested FIP doesn’t apply to all pitchers. He cited Seattle’s Chris Young as a pitcher who consistently has a lower ERA than FIP. This reminds me of an excellent analysis of Mark Buehrle and how his defense makes him a better pitcher than FIP suggests.

Try out True Odds for free

Onside Sports has done a remarkable job using data to find value in the baseball market. Their predictions have registered 298 wins, 271 losses and 10 pushes for a return on investment of 11% through September 9.

As a reader of The Power Rank, you can try out True Odds for free. Follow the steps under this video and use the code THEPOWERRANK. With only 4 weeks left before the baseball postseason, check it out today.