We did a little baseball analytics in 2011. Back then, the Green Bay Packers were Super Bowl Champions, President Obama had just announced the capture of Osama bin Laden and the Pittsburgh Pirates were threatening to end 18 years of losing records. Could they do it?
The Pittsburgh Pirates have been bad for 18 years.
Since they lost to Atlanta in the 1992 NL Championship Series, the Pirates have not won more than half their games in a season. In the previous six full seasons, they haven’t come within 13 games of winning the 81 games required for 0.500 baseball. It’s a remarkable streak even for a small market Major League Baseball team.
In contrast, the Pittsburgh Steelers have won two Super Bowls and played in two others since 1992. The Pittsburgh Penguins won the Stanley Cup in 2009. The people of Pittsburgh wear shirts that say “Pittsburgh, City of Champions… and the Pirates”.
However, this season has been different. The Pirates have a 52-47 record through July 24, 2011. Not only might they end 18 years of losing, but Pittsburgh actually leads the NL Central. The baseball gods are teasing Pirates with thoughts of the postseason when just playing 0.500 baseball would be a monumental achievement for this franchise.
But before Pittsburgh starts preparing their ballpark for the World Series, let’s ask how they’re having so much success. Remember, the Pirates lost 105 of 162 games last year, pushing the 0.333 winning percentage that no MLB team normally dips below.
Are the Pirates getting lucky?
How runs are created in baseball
To answer this question, consider how a team scores runs: batters get on base, and then subsequent batters drive them home. Hence, teams score more runs if they cluster their hits into one inning rather than scatter them throughout the game.
Now, it’s highly unlikely that teams can control how their hits are distributed in a game. Do hitters focus more intently with a base runner on third? Do they care less with the bases empty? No one has ever found statistical evidence of clutch hitting in baseball.
While this research won’t convince every baseball fan, we’ll assume a team can’t control when it hits a homer or gives up an RBI double. For each team that hits 283 doubles and 154 homers, some will score 745 runs while others will score 698. Here, we’ll examine a few formulas for the average runs scored given a team’s statistics. The difference between actual runs scored and this average will measure luck.
What Bill James says about creating runs
Not surprisingly, Bill James, the father of baseball analytics, developed one of the first formulas for run creation. The basic idea is incredibly simple: runs are scored by getting base runners and then driving them home. His Runs Created says runs are equal to the number of runners that get on base times the rate at which they’re driven home. The number of base runners is the sum of hits, walks, and other minor events. James said the second rate factor is proportional to the slugging percentage, or the total number of bases (1 base for a single, 4 bases for a homer) per at bat.
Over the last ten seasons, Runs Created overestimates the actual runs a team scores in a season by 18.8 runs. Since teams averaged 758 runs a season over this period, Runs Created makes an error of only 2.5%. Since a team allows runs in the same way in which it creates them, Runs Created can also be used to evaluate how a team’s pitching and defense prevent runs. Over the same time period, Runs Created overestimates runs allowed by 18.5 runs, or 2.4% error.
Is there a more accurate run creation formula?
However, the Runs Created formula isn’t perfect. Take the simple example in which a batter hits a solo home run in his only major league at bat. The formula of Bill James gives 4 runs created, which clearly overestimates the one run from the homer.
Dave Smyth developed a different run creation formula in which home runs explicitly count as one run. The remaining runs come from a second contribution inspired by the simple logic of Runs Created except that home run hitters no longer count as base runners.
The basic version of this formula is reminiscent of good physics research in that one attempts to explain complex phenomena with simple expressions. In physics, the merits of these simple expressions are judged by their correctness in particular limits. In baseball, this limit is the hypothetical team which only hits home runs. One can show that Base Runs passes this test.
Moreover, Base Runs overestimates runs scored and allowed over a season by 7.5 and 7.2 runs respectively, giving less than 1% error. We’ll use this formula in assessing luck in Major League Baseball.
Will the Pirates finish with a winning record?
This season, the Pirates have given up 36 fewer runs than expected. To explain the magnitude of this luck, the standard deviation of Base Runs from the actual runs is about 14 runs at this point in the season. In other words, one expects that the runs allowed by two thirds of MLB teams deviates from the Base Runs prediction by less than 14 runs. Pittsburgh is more than two and a half standard deviations away from this average on the lucky side.
Here, we list the luck factor on offense and defense for all teams this season. Pittsburgh’s NL Central foe Milwaukee also is an outlier, giving up 16 more runs than expected. The Brewers have been unlucky in run prevention, although they are still deadlocked with Pittsburgh for the division lead. Don’t be surprised to see the Brewers surge and take the division.
The 2011 Pittsburgh Pirates faded down the stretch, winning only 72 of 162 games. It was their 19th straight losing season. The Milwaukee Brewers won the NL Central and made it to the National League Championship Series. They lost to the St. Louis Cardinals in 6 games.
To see current cluster luck rankings for MLB, click here.
Thanks for reading.