Ensemble prediction for MLB 2017 win totals

I’m working on baseball predictions, and these predictions look pretty stupid in April unless you use preseason expectations.

These ensemble win total predictions combine the markets with 4 different computer models (Joe Peta, Clay Davenport, PECOTA and Fangraphs).

1. Los Angeles Dodgers, 94.9
2. Chicago Cubs, 94.6
3. Cleveland, 92.6
4. Houston, 92.1
5. Washington, 90.5
6. Boston, 90.2
7. San Francisco, 88.1
8. New York Mets, 86.8
9. Seattle, 84.8
10. Texas, 84.5
11. Toronto, 84.4
12. St. Louis, 83.1
13. Pittsburgh, 82.6
14. New York Yankees, 81.5
15. Detroit, 81.2
16. Tampa Bay, 80.5
17. Los Angeles Angels, 80.4
18. Baltimore, 78.8
19. Colorado, 78.2
20. Miami, 77.8
21. Arizona, 77.5
22. Atlanta, 75.4
23. Minnesota, 75.1
24. Oakland, 74.9
25. Kansas City, 74.8
26. Philadelphia, 71.6
27. Milwaukee, 71.1
28. Chicago White Sox, 70.5
29. Cincinnati, 70.2
30. San Diego, 66.8

MLB free agents salary projections through analytics

This is a guest post from Nick Ceraso and Julian Frenkel, students at the University of Michigan.

How much should your team pay a free agent in baseball? Will your team strike the jackpot on a young, undervalued player or overpay for an aging star?

MLB free agency is particularly interesting, as baseball is the only one of the four major sports without a salary cap. Baseball’s offseason is an open market, with only a relatively small luxury tax for teams with the biggest payrolls.

Here, we take a data driven approach to predicting free agent salaries based on WAR, or Wins Above Replacement. This article discusses the method and looks at how these predictions performed for the 2015-2016 off season.

Finally, we look at the most interesting predictions for the 2016-2017 off season. You can check out all the results on this Google spreadsheet.

Regression model based on WAR

Using regression, we developed a linear and quadratic model for free agent salaries based on a player’s WAR for his three seasons prior to free agency. Regression provides the optimal coefficients for weighting each of these seasons.

The most precise blend of the three WARs for both models weights WAR for the last year very heavily, and the third year almost not at all. While this makes sense conceptually, it can cause our model to miss on some players.

We also attempted to measure the impact of player availability at each position on the market. By dividing individual players WAR by the total WAR available to the market for their position, we were able to gauge their relative strength on the market.

We then multiplied our WAR weighted average by 1 + ((Player WAR) / (Total Position WAR)), a term that gives an extra boost to high WAR players. This reduced the sum of squared error significantly and improved the accuracy of both models as a result.

The figure shows the results for the quadratic (nonlinear) and linear model.


The model is simple, and it doesn’t consider important factors that will affect a free agent contract, such as:

  • a slow-developing market for a position
  • a glaring need by a large market team
  • an impatient owner who wants to win now
  • age, as older players are often unwilling to take short term deals, and teams are unwilling to sign long term ones

However, we’ll see the model’s accuracy in predicting free agent contracts.

The simplicity of our model also contrasts it from the “value metric” of Fangraphs (pitchers and hitters). This method seems to place a lot of value on “market intangibles” or various factors that account for two players with equal productivity being paid differently.

Success and failure from 2015 free agency

After the 2015 season, we experienced great success in predicting some starting pitcher’s contracts. Let’s take a look at a few examples.

  • John Lackey signed a two year, $32,000,000 deal with the Cubs. Model prediction: $16,000,000 per year
  • Hisashi Iwakuma signed a 1 year, $12,000,000 deal with the Mariners. Model prediction: $11,925,600 per year
  • Rich Hill signed a 1 year $6,000,000 deal with Oakland. Model prediction: $6,043,300 per year.

Not only were these all starting pitchers, but they were starting pitchers who were not the best in their free agent class (David Price, Zack Grienke) thus they were not subject to as many market intangibles. These three starters all had an above average season in 2015, but they are not a franchise building block.

On the other hand, one of the largest misses last year was 2B Daniel Murphy, who signed a three year, $37,500,000 contract with the Nationals. Our model predicted him to earn $4,510,000 based off of his performance.

However, above other market intangibles, Daniel Murphy changed his swing during the 2015 playoffs. This change helped him win the NLCS MVP and carry the Mets to the World Series.

Without accounting for his new swing (and therefore increased performance), our model vastly undershot his predicted salary on the open market. These cases seem few and far between, and we do not expect many cases like this in the future.

Predictions for free agency in 2016

Our model does well with two types of players: the late bloomers and the models of consistency. This section will look at examples of each as well as a player we don’t expect the model to predict that accurately.

You can find all the predictions on this Google spreadsheet.

Rich Hill

Rich Hill is the ultimate late bloomer. After bouncing around the majors, Hill found himself in the Red Sox organization as a reclamation project. Looking at his WAR from 2014-2016, it seems like it worked, as he had a WAR of 0.2, 1.6, and then 4.1 the past three seasons.

In a unique case like this, it appears that his salary will be driven by his performance this year more than past years. We believe our model prediction of $16,540,000 is right about what he’ll end up taking home.

Justin Turner

Another example of an ideal player for our model is third baseman Justin Turner, a model of consistency. Turner has been consistently good-to-great for the Dodgers, averaging a WAR of 4.33 since 2014. This past year, he fell right in line with that, being worth 4.9 wins.

With his 2016 performance being indicative of the type of player that he is, we believe his predicted salary of $20,000,000 will be an accurate prediction.

Yoenis Cespedes

After defecting from Cuba and signing with the Oakland Athletics, Cespedes has enjoyed success during his time in the Majors. Looking at his WAR from the past three seasons, he was worth 4.1 wins in 2014, 6.3 in 2015, and 2.9 this past season.

As our model places a heavy emphasis on past year’s performance, his 2.9 WAR is the driving force behind his predicted salary. However, his talent level exceeds his 2016 WAR figure, and he will most likely be paid a higher salary than our model projects.

After two great years of contributing 4+ wins to his team, he will not be valued as heavily on his 2016 performance as the model suggests. With that in mind, we believe the model prediction of $12,390,000 is on the low side for Cespedes.

All stats used in this article and the regression model are from Baseball Reference and contract details are from MLB Trade Rumors.

The curse of small sample size in Tigers bullpen stats

Screen Shot 2015-08-12 at 8.36.52 AMThe Tigers bullpen has been a mess all season. However, there are always some small sample size flukes with pitchers that throw so few innings.

In my latest Detroit News article, I look at Bruce Rondon, a hard throwing right hander that has shown Tigers fans his best and worst this season.

The crux of the article is how humans want to tell stories about small sample size flukes. The same obsession with patterns that gives us technological marvels like smart phones also gets us in big trouble when watching sports inherent with randomness.

This might be the first sports column to quote a neuroscience book.

To read my Detroit News article on bullpen statistics, click here.

Contenders’ flaws still give Tigers a chance

Screen Shot 2015-08-05 at 9.48.35 AMIn my most recent Detroit News column, I look at the American League landscape and find it devoid of all but one true contender.

It was kind of an argument that the Tigers could not sell their players and make a run at the playoffs. However, the Tigers did sell, a healthy move for their franchise.

The one AL team that I did like was the New York Yankees. In the 9 nine days since the column appeared, the Yankees have surged. They now have the best record in the AL and are 2nd in my MLB team rankings.

To read the Detroit News article, click here.

Is the Home Run Derby slump real?


Over at the Detroit News, I look into whether hitters who participate in the Home Run Derby go into a slump afterwards.

The idea seemed ridiculous to me. However, I was surprised by the truth.

The Home Run Derby slump is a textbook example of regression to the mean. In writing this article, I think I developed a pretty good way to think about this crucial statistical concept.

In addition, this Home Run Derby curse has an impact on players who won’t participate.

To read the article, click here.