How the hot start Tigers have actually been unlucky

tigers_cluster_luckIn my latest article for the Detroit News, I ask whether the Tigers have gotten lucky early this season.

There are many ways luck can affect a baseball team, but I look at cluster luck here. To read why the Tigers have gotten unlucky in this department, click here.

This doesn’t mean the Tigers will continue to score runs at their current torrid pace. As a commenter pointed out, they have a high batting average on balls in play, which will regress as the season progresses.

However, we can rule out cluster luck early this season. The article has numbers for all MLB teams through Sunday. The list below gives cluster luck through Tuesday’s games.

The first number gives total cluster luck, while the numbers in parentheses gives a breakdown on offense and defense. In all cases, a positive number implies good luck.

1. Texas, 19.32. (11.62, 7.70).
2. New York Mets, 16.55. (9.48, 7.08).
3. Arizona, 14.95. (5.89, 9.06).
4. Toronto, 14.45. (14.83, -0.38).
5. Los Angeles Angels, 10.53. (9.16, 1.36).
6. Pittsburgh, 8.72. (11.18, -2.46).
7. San Diego, 7.27. (3.44, 3.82).
8. Boston, 4.79. (11.96, -7.17).
9. St. Louis, 4.78. (-3.91, 8.69).
10. Atlanta, 2.88. (0.86, 2.02).
11. Philadelphia, 2.09. (-3.17, 5.25).
12. Colorado, 1.46. (-1.37, 2.83).
13. New York Yankees, 1.16. (3.35, -2.19).
14. Minnesota, 1.16. (4.00, -2.85).
15. Kansas City, 0.88. (2.89, -2.01).
16. Cleveland, 0.08. (0.23, -0.16).
17. Cincinnati, -1.29. (3.19, -4.48).
18. Chicago Cubs, -2.25. (1.66, -3.91).
19. Chicago White Sox, -3.76. (-2.57, -1.19).
20. Houston, -5.11. (-5.89, 0.78).
21. Washington, -5.47. (-0.72, -4.75).
22. Oakland, -6.50. (1.67, -8.17).
23. Miami, -6.76. (3.40, -10.16).
24. Detroit, -8.86. (-8.23, -0.63).
25. Milwaukee, -9.15. (2.44, -11.59).
26. San Francisco, -10.72. (-13.43, 2.71).
27. Seattle, -11.66. (-3.98, -7.68).
28. Baltimore, -12.61. (-0.99, -11.62).
29. Los Angeles Dodgers, -15.28. (-13.80, -1.48).
30. Tampa Bay, -15.53. (-3.77, -11.76).

Detroit’s cluster luck has gotten worse on offense but gone back to neutral on defense.

Cluster luck numbers for the 2014 MLB regular season

To explain cluster luck, my Grantland colleague Jonah Keri wrote the following.

Joe Peta, a former Wall Street trader, presented cluster luck in his book, Trading Bases. Essentially, the concept boils down to this: When a team’s batters cluster hits together to score more runs and a team’s pitchers spread hits apart to allow fewer runs, that’s cluster luck. Say a team tallies nine singles in one game. If all of those singles occur in the same inning, the team would likely score seven runs; if each single occurs in a different inning, however, it’d likely mean a shutout.

Here are the numbers for cluster luck for the 2014 regular season. For each team, it shows total (offense, defense) for cluster luck. In all cases, a positive number implies good luck, or scoring more runs than expected on offense or allowing fewer runs on defense.

1. New York Mets, 57.29. (9.00, 48.29).
2. Seattle, 48.63. (27.67, 20.97).
3. Cincinnati, 41.94. (12.04, 29.90).
4. Baltimore, 40.00. (-13.53, 53.54).
5. Oakland, 39.71. (44.23, -4.51).
6. Kansas City, 31.46. (16.84, 14.61).
7. Texas, 14.93. (7.64, 7.30).
8. San Diego, 13.98. (-4.81, 18.78).
9. Los Angeles Angels, 11.01. (44.40, -33.39).
10. Minnesota, 9.83. (14.90, -5.08).
11. Toronto, 8.99. (-12.04, 21.04).
12. Washington, 5.12. (-9.92, 15.04).
13. Philadelphia, 4.87. (7.44, -2.58).
14. Atlanta, 4.47. (-29.24, 33.71).
15. Miami, 0.88. (-17.45, 18.33).
16. Boston, -0.39. (-13.36, 12.97).
17. St. Louis, -0.59. (-9.22, 8.63).
18. San Francisco, -1.06. (7.16, -8.22).
19. Milwaukee, -1.29. (-9.09, 7.80).
20. Cleveland, -6.12. (-17.96, 11.84).
21. Detroit, -7.78. (-14.46, 6.68).
22. New York Yankees, -10.36. (-5.51, -4.86).
23. Los Angeles Dodgers, -14.87. (-19.74, 4.87).
24. Arizona, -15.48. (-5.73, -9.75).
25. Colorado, -28.07. (-29.56, 1.49).
26. Chicago White Sox, -28.85. (-10.81, -18.04).
27. Pittsburgh, -40.53. (-42.80, 2.26).
28. Houston, -44.07. (-19.16, -24.90).
29. Tampa Bay, -48.93. (-30.20, -18.73).
30. Chicago Cubs, -67.81. (-20.64, -47.17).

Cluster luck is the deviation of actual runs from Base Runs, the runs created formula of Dave Smyth. The difference in runs scored and runs allowed by Base Runs provides a way to rank teams. The results below give run differential (runs scored, runs allowed). The record denotes a Pythagorean expectation with an exponent of 1.83.

1. Los Angeles Angels, 131.99. (728.60, 596.61). Record: 95-67.
2. Washington, 125.88. (695.92, 570.04). Record: 95-67.
3. Oakland, 117.29. (684.77, 567.49). Record: 94-68.
4. Los Angeles Dodgers, 115.87. (737.74, 621.87). Record: 93-69.
5. Pittsburgh, 91.53. (724.80, 633.26). Record: 90-71.
6. Baltimore, 72.00. (718.53, 646.54). Record: 88-74.
7. Detroit, 59.78. (771.46, 711.68). Record: 86-76.
8. San Francisco, 52.06. (657.84, 605.78). Record: 86-75.
9. Tampa Bay, 35.93. (642.20, 606.27). Record: 85-77.
10. Seattle, 31.37. (606.33, 574.97). Record: 84-78.
11. Toronto, 28.01. (735.04, 707.04). Record: 83-79.
12. Cleveland, 22.12. (686.96, 664.84). Record: 82-79.
13. St. Louis, 16.59. (628.22, 611.63). Record: 82-80.
14. Kansas City, -4.46. (634.16, 638.61). Record: 79-82.
15. Milwaukee, -5.71. (659.09, 664.80). Record: 80-82.
16. New York Yankees, -20.64. (638.51, 659.14). Record: 78-84.
17. Chicago Cubs, -25.19. (634.64, 659.83). Record: 77-84.
18. Atlanta, -28.47. (602.24, 630.71). Record: 77-85.
19. Miami, -29.88. (662.45, 692.33). Record: 77-85.
20. Colorado, -34.93. (784.56, 819.49). Record: 77-84.
21. New York Mets, -46.29. (620.00, 666.29). Record: 75-87.
22. Houston, -49.93. (648.16, 698.10). Record: 75-87.
23. San Diego, -55.98. (539.81, 595.78). Record: 73-89.
24. Cincinnati, -58.94. (582.96, 641.90). Record: 73-89.
25. Chicago White Sox, -69.15. (670.81, 739.96). Record: 73-89.
26. Minnesota, -71.83. (700.10, 771.92). Record: 73-89.
27. Philadelphia, -72.87. (611.56, 684.42). Record: 72-90.
28. Boston, -80.61. (647.36, 727.97). Record: 72-90.
29. Arizona, -111.52. (620.73, 732.25). Record: 68-94.
30. Texas, -150.93. (629.36, 780.30). Record: 65-97.

For my analysis of how cluster luck will affect certain teams in the playoffs, check out my article on bettingexpert.com.

How to use baseball analytics for a profitable sports investment

true_oddsDo you bet on baseball? Are you looking for an extra edge based on data and analytics?

Onside Sports has new solution. While they launched as a social sports app last year, they have now developed True Odds, a data driven prediction system for baseball. True Odds, an in-app purchase, has a 298-271-10 record this season through September 9th.

I had the opportunity to talk with Kai Yu, the brains behind True Odds. While he obviously could not tell me everything about his methods, he did share quite a bit, which I’ll share in this post.

If you’re eager to get a free trial of their picks, click here and use the code THEPOWERRANK.

Baseball from its fundamental interaction

True Odds starts with the matchup between pitcher and batter. Based on historical data, it seeks to estimate probability of an event such as Miguel Cabrera’s hitting a home run off James Shields.

As part of this analysis, Kai had to carefully sort out which variables predict the future and which variables tend towards randomness. He noted contact rate as an import skill for a hitter. It’s tough to strike out Victor Martinez no matter who pitches to him.

This bottom up approach has advantages over the top down approach that looks at overall team performance. Often times, this top down approaches looks at a team’s runs scored and allowed. However, these numbers can be greatly affect by the sequencing of hits, or cluster luck. Combining pitcher batter matchups with the simulation method below does not have these problems.

Random simulations

Based on the probabilities from every pitcher batter matchup, True Odds uses a random simulation to play the game many times. Each simulation is different, and a set of simulations gives the probability that certain events happen, such as a Detroit win over San Francisco or a total of more than 7 runs for Oakland and Seattle.

To accurately simulate a game, True Odds must know both the pitcher and the opposing line up. This method naturally accounts for injuries.

Other quants have also used pitcher batter matchups and random simulations to profit on baseball. For example, check out this excellent Q&A with David Frohardt-Lane on Regressing, Deadspin’s sports data blog.

A multitude of other factors

Kai also stressed the importance of other factors, such as park, weather and umpires. True Odds incorporates these factors in predicting the outcomes of games.

Let’s discuss umpires, who can impact home field advantage. As Jon Wertheim and Tobias Moskovitz discussed in their book Scorecasting, umpires tend to call more strikes on road than home batters. This tendency increases in high leverage situations, such as two outs with the bases loaded in a close game in the bottom of the ninth.

However, umpires might not play as big a role in home advantage anymore. Through September 8th, home teams have scored a mere 49 more runs than road teams. This 0.02 runs per game is much lower than the historical average.

Major League Baseball might be keeping a more watchful eye on umpires with cameras. I bet True Odds has a grasp on this.

Does FIP apply to every pitcher?

The most interesting part of my conversation with Kai concerned whether fielding independent pitching applied to every pitcher.

To recap, fielding independent pitching comes from the research of Voros McCracken, who discovered that pitchers do not affect batting average on balls in play (BABIP). Pitchers have control over their strike outs, walks and home runs allowed. However, 30% of balls hit in play become hits, and deviations from this average for a pitcher strongly regress to the mean.

This research led to the development of FIP, a runs allowed statistic that only considers strike outs, walks and home runs. It should replace ERA in any discussion of pitching performance.

However, Kai suggested FIP doesn’t apply to all pitchers. He cited Seattle’s Chris Young as a pitcher who consistently has a lower ERA than FIP. This reminds me of an excellent analysis of Mark Buehrle and how his defense makes him a better pitcher than FIP suggests.

Try out True Odds for free

Onside Sports has done a remarkable job using data to find value in the baseball market. Their predictions have registered 298 wins, 271 losses and 10 pushes for a return on investment of 11% through September 9.

As a reader of The Power Rank, you can try out True Odds for free. Follow the steps under this video and use the code THEPOWERRANK. With only 4 weeks left before the baseball postseason, check it out today.

Baseball cluster luck article on FiveThirtyEight

Over the past few years, I’ve been calculating cluster luck in baseball. This is based on the idea that teams can score more runs when they cluster their hits together (or allow fewer runs when pitchers scatter hits).

However, teams can’t consistently cluster hits together. Cluster luck calculations show us which teams will not keep up their torrid early season pace.

Last week, Jonah Keri used my cluster luck numbers on FiveThirtyEight to show how this has happened San Francisco and closer Sergio Romero. Then he discussed how cluster luck continues to help Seattle but regression could hit soon.

Getting him the updated cluster luck numbers was simple, as I just use widely available season totals. However, creating the above graph was a lot of work, since it required box score numbers on a daily basis.

However, the work was worth it, as I’m cooking up a way to incorporate cluster luck into my MLB rankings. It should give us a better grasp on Oakland, a team that can’t possibly be more than 1.4 runs better than MLB average.

More on this soon.

To read Jonah Keri’s article on cluster luck based on my numbers, click here.

Is St. Louis safe from an upset against Pittsburgh? A 2013 MLB playoff preview

mlb2013_baserunsThe St. Louis won more games than any other NL team this season, winning the Central by 3 games over Pittsburgh. Moreover, they’re ranked 2nd in The Power Rank while Pittsburgh is 9th. It should be easy to call a series win for St. Louis.

Not so fast.

To dig deeper into these two teams, consider the idea of cluster luck. Some teams tend to cluster their hits together and score more runs. Other teams spread their hits out over the innings and score fewer runs.

One can quantify this luck using run creation formulas. These equations take box score statistics such as at bats and hits to estimate the number of runs a team should have scored. Deviations from this expectation are random.

You can read more about this in my cluster luck article on bettingexpert.

Pittsburgh at St. Louis

Cluster luck has some dramatic consequences on this series.

This season, the Cardinals have scored 60 more runs than expected, almost 3 standard deviations away from the mean. They scored more runs than any teams besides Boston and Detroit despite having average power numbers.

While the Cardinals have gotten lucky, the Pirates have not been so fortunate. They have scored 34 fewer runs than expected. Moreover, their league leading 577 runs allowed has not been the result of luck. This remarkable total is only one less run than expectation.

The visual above accounts for cluster luck by ranking teams by expected runs scored minus allowed. St. Louis only leads Pittsburgh by 10 runs over the course of the season.

To see the rankings of all 30 teams, click here.

The gambler’s perspective

While I originally looked into run creation a few years ago, Joe Peta, author of Trading Bases, inspired me to get back into the analysis. He has his own way of quantifying cluster luck that seems consistent with my work.

Joe thinks the Cardinals still have an edge. His preview shows that the Pirates pitching and defense was spectacular during the first half of the season but only average the second. In a 5 game series, he’s predicting Cardinals in 4.

Why should you trust his prediction? Joe took his methods to Las Vegas and turned a 41% profit. To read more, buy his book Trading Bases.

Outlook

This series has a special meaning to me since two of my best friends root for the opposing teams.

I texted the Pirates fan a few weeks back about this cluster luck analysis. When the Pirates started to lose the division in the season’s last week, I got a death threat.

Cluster luck had an even more drastic effect on the Cardinals fan. Despite his Ph.D. in Mechanical Engineering from Stanford, he texted this back.

You have just made me doubt all numbers ever.

Nothing like analytics to make baseball fans crazy.