This article was written by Larry Behrendt of It’s About the Money Stupid, a blog about the New York Yankees.  

Introduction: Guessing Games

You’ve probably read blog pieces where the blogger presents statistics for two different players, and the statistics are nearly identical. Then the blogger tells you that player A is someone like Ted Williams and player B is someone like Joe Shlabotnik, and you think to yourself that Joe Shlabotnik must have been a better player than you’d previously imagined.

Let’s try something similar, only this time let’s lay out statistics from Boston’s ill-fated 2011 season. Below are Red Sox starting pitching statistics taken from FanGraphs; each line represents statistics from a different month of the 2011 season.

W-L

IP

ERA

FIP

Month 1 10-10 155

3.83

4.40

Month 2 13-7 159.1

3.95

4.34

Month 3 13-5 148.2

4.54

4.21

Month 4 4-13 128.1

7.08

5.19

 

 

 

 

 

If you were even half-awake during the 2011 season, you’ll know that Month 4 above has to be September: only four wins, a miserable 7.08 earned run average, with barely more innings pitched by the starters than were pitched by the bullpen (115.1). (In case you were wondering, Month 1 above is March and April, Month 2 above is June, and Month 3 above is July.) Even FanGraph’s FIP, an advanced pitching statistic that attempts to measure a pitcher’s effectiveness independent of the quality of the defense behind him, shows that the Red Sox starters had a terrible final month of the season. (I’ll discuss this advanced statistic and others before too long.)

Now let’s look at the same four months, using three advanced pitching statistics from FanGraphs: xFIP, SIERA and tERA. To make the exercise more difficult, this time I’ll mix the rows up – the months shown below are not necessarily in chronological order! Can you guess which row shows how the Red Sox starters performed in September?

xFIP

SIERA

tERA

Month 1

4.13

3.91

5.09

Month 2

4.55

4.27

5.37

Month 3

4.62

4.32

4.20

Month 4

4.38

4.15

4.43

Suddenly the picture is not so clear. During Month 3 above, the Sox starters had their worst xFIP and SIERA. Month 2 contains the worst tERA figure, but not that much worse than Month 1. Which month above represents September? If you look at the numbers closely enough, you can probably figure it out: Month 2 above is consistently lousy: worst tERA, second worst SIERA, second worst xFIP. Sure enough, Month 2 is September.

But isn’t it odd that according to SIERA and xFIP, the worst month of 2011 for the Red Sox starters was May and not September? In May, the Red Sox starters won 13 games and lost just 6, and posted a 4.14 ERA (slightly better than league average for that month). In contrast, the Red Sox starting rotation posted a May SIERA a full half point higher than the American League average for 2011, and the rotation’s xFIP for May was 14% worse than the American League average for that month. The conventional statistics paint the conventional picture for September, but the advanced statistics point in another direction. What on earth is going on?

A Detour: Advanced Pitching Statistics, Briefly Explained

No more guessing games. Here are complete 2011 statistics for the Red Sox starters:

ERA

ERA-

FIP

FIP-

xFIP

xFIP-

SIERA

tERA

AL Average

4.08

4.05

4.03

3.80

4.35

Mar/Apr

3.83

90

4.40

104

4.38

108

4.15

4.43

May

4.14

97

4.21

99

4.62

114

4.32

4.20

June

3.95

92

4.34

102

4.50

111

4.05

4.06

July

4.54

106

4.21

99

4.13

102

3.91

5.09

Aug

3.97

93

4.04

95

3.76

93

3.68

4.61

Sept

7.08

166

5.19

122

4.55

112

4.27

5.37

Sox SEASON

4.49

105

4.36

103

4.31

106

4.06

4.60

It’s also time to explain these statistics to the uninitiated. ERA, or Earned Run Average, measures the number of earned runs allowed by a pitcher per nine innings. ERA is a conventional statistic, one that’s been around forever, and it’s a good statistic for confirming what we already know: Red Sox opponents scored a lot of runs against the Red Sox during the month of September.

FIP is a defense-independent pitching statistic, one designed to measure a pitcher’s ability to prevent runs for which he is “responsible”.  I put “responsible” in quotes because the theory behind FIP is that a pitcher is only truly responsible on the positive side for striking people out, and on the negative side for walks, hit by pitches and home runs allowed.  In other words, FIP does not blame a pitcher for allowing a base hit, so long as the base hit does not leave the ballpark on a fly. In similar fashion, FIP gives a pitcher no credit for outs produced by his team’s defense. Instead, FIP is calculated as if a pitcher allowed a league-average batting average on balls in play (or BABiP).

It may strike you as odd that FIP does not penalize a pitcher for, say, allowing five straight hits that bounce off the top of the center field wall. But FIP is based on solid research (initiated by Voros McCracken and confirmed by others) that pitchers have limited ability to cause balls hit into play to turn into outs. Instead, most pitchers end up with similar career BABiP figures, regardless of how effective those pitchers might be.

If you think that the very best pitchers in baseball can do a superior job of inducing batters to hit into outs, think again: Roy Halladay has a career BABiP of .292. Cliff Lee has a career BABiP of .295. CC Sabathia: .291. Felix Hernandez: .297. Chris Carpenter: .298. Tim Lincecum: .293. Josh Beckett: .290. Contrast a list of not-so-successful pitchers (at least of late): A.J. Burnett: .290. Javy Vazquez: .295. Kyle Lohse, once named baseball’s most mediocre pitcher: .302.

There ARE outliers. Andy Pettitte had a career BABiP of .309 (as does John Lackey). Jered Weaver and Johan Santana have produced career BABiPs of .276 and .275, respectively. The difference between Weaver’s and Lackey’s BABiP figures amounts to something like an extra hit allowed by Lackey every two games, which is hardly sufficient to explain the difference in these pitcher’s career paths! And remember, Weaver and Lackey are at the extreme ends of BABiP – the BABiP differences between two randomly selected pitchers will be much less than this.

Even with these outliers, the consistency of career BABiP pitching numbers is remarkable. More remarkable is that when we look at a pitcher BABiP numbers for a single season, we see that these numbers can vary wildly from year to year. Justin Verlander’s 2011 BABiP was .236. In 2010, his BABiP was .286, and in 2009 it was .319. We cannot predict with any certainty what kind of BABiP Verlander will produce in 2012. Last year’s BABiP leader was Trevor Cahill, at .236. Cahill’s BABiP jumped to .302 in 2011. Tim Hudson’s BABiP fell from .338 in 2009 to .249 in 2010. James Shields’ BABiP fell from .341 in 2010 to .258 in 2011.

Obviously, it was important that James Shields’ BABiP dropped 83 points in 2011 – the drop helped Shields to win three more games and cut his 2010 ERA nearly in half. But Shields’ future BABiP should prove to be closer to his .299 career average than to the extremes he posted in 2010 and 2011. For the most part, BABiP is outside of a pitcher’s ability to control, and thus removing BABiP from our advanced statistics makes sense: it allows us to focus on a pitcher’s true abilities. This conclusion is confirmed by the fact that statistics like FIP do a superior job of predicting a pitcher’s future performance than do conventional statistics like ERA.

All advanced pitching statistics have this in common: they either modify the effect of BABiP on a pitcher’s numbers, or else they eliminate this effect altogether. As I noted above, FIP is a statistic that ignores BABiP: FIP looks only at a pitchers strikeouts, walks, hit by pitches and home runs allowed. xFIP is even more radical than FIP: xFIP not only ignores BABiP, it also ignores the number of home runs allowed by a pitcher. xFIP regards home runs the same way it regards BABiP, as outcomes largely outside of a pitcher’s control.  So xFIP is calculated as if a pitcher allowed a league average number of home runs for every fly ball hit against that pitcher.

Our two remaining advanced statistics consider balls hit into play, but do so in a way that is, well, advancedSIERA, or Skill-Interactive Earned Run Average, is something like xFIP, only SIERA gives great credit to high-strikeout pitchers. According to SIERA, high strikeout pitchers get hitters to make weaker contact when they do hit the ball. SIERA is thus calculated as if high strikeout pitchers will have lower BABiPs, lower home run to fly ball rates, and so forth.

tERA, or True Earned Run Average, looks at the actual trajectory and speed of each ball hit into play. So if pitcher A and pitcher B have identical FIP numbers, but pitcher A allows fewer line drives than pitcher B, pitcher A should have a better tERA than pitcher B.

There’s one last set of statistics to discuss: ERA-, FIP- and xFIP-. These “minus statistics” allow us to easily compare a pitcher’s or team’s results to the league average. So when the Red Sox starters produced an ERA- of 90 in March and April, that means that the team’s ERA was 90% (or 10% better) than the league average ERA for those months. Similarly, the starters’ 104 FIP- for March and April indicates that their FIP for these months was 4% worse than the league average.

An Advanced View

To save you from scrolling, I’ll repeat the advanced statistics chart from earlier:

ERA

ERA-

FIP

FIP-

xFIP

xFIP-

SIERA

tERA

AL Average

3.80

4.35

Mar/Apr

3.83

90

4.40

104

4.38

108

4.15

4.43

May

4.14

97

4.21

99

4.62

114

4.32

4.20

June

3.95

92

4.34

102

4.50

111

4.05

4.06

July

4.54

106

4.21

99

4.13

102

3.91

5.09

Aug

3.97

93

4.04

95

3.76

93

3.68

4.61

Sept

7.08

166

5.19

122

4.55

112

4.27

5.37

Sox SEASON

4.49

105

4.36

103

4.31

106

4.06

4.60

These statistics challenge the conventional wisdom that the Red Sox collapsed because their starters fell apart during the last month of the season. True, the Red Sox starting pitchers did fall apart in September – the starters’ September ERA was 66% worse than the AL average. But if you look carefully at the advanced statistics, you’ll see that the Red Sox starters were a mediocre group all year long. Per FIP, the Sox starters had one decent month, in August, where they produced a FIP 5% above league average. But for the rest of the year, the FIP of the Sox starting staff was at league average, or a bit below, or (in September) way below. The story according to xFIP, SIERA and tERA is worse than this: the starters managed to beat the league average xFIP and SIERA only in August, and the starters’ combined tERA was below league average in July, August and (of course) September.

This is take-away lesson number one from the advanced statistics: as a staff, the Red Sox starters were awful in September, but they were worse than league average before that.

You may wonder, if the Sox starting pitching was all that bad prior to September, then how did the starting staff manage to win 50 games and lose only 27 between May and August? I’m glad you asked that question …

The ERA-FIP Gap

One remarkable thing about the Sox season is that the Sox starting staff had a better than league average ERA during every month of the season except July and (of course) September. The Sox starters’ team ERA was third best in the AL in March/April, and second best in August.  How did they manage this, when the team’s FIP prior to September was no better than mediocre?

From our analysis above, we know that the difference between ERA and FIP is BABiP. For the Sox starters to have produced ERA numbers better than their FIP numbers, we’d expect that the Sox starters enjoyed a lower than average BABiP. And we’d be right:

BABiP

AL Average

0.291

Mar/Apr

0.262

May

0.286

June

0.235

July

0.326

Aug

0.291

(Note that we’re excluding September from the above chart and from the next few charts that follow. September merits a separate analysis, which you’ll find towards the end of this piece.)

The Red Sox 16-9 record in June was fueled in considerable part by the starting rotation’s remarkable .235 BABiP that month, the best BABiP in the American League that month by a whopping 21 points. The starters’ March/April BABiP was second-best in the AL (trailing the Angels by a single point). Even the rotation’s .291 BABiP in August, a number than might look pedestrian in relation to the other numbers above, was the third best in the American League that month.

Historically, the Red Sox starting pitching has not been a terrific BABiP bunch: .298 in 2010, .315 in 2009, .288 (a good year) in 2008, and .292 during their 2007 Championship year. But as we’ve seen, BABiP can fluctuate quite a bit from year to year, and prior to September, 2011 was a good year for Red Sox BABiP – good enough to explain how the starting rotation’s mediocre FIP numbers translated into good team ERA numbers.

But that is not the end of this story.

The FIP-xFIP Gap

The advanced statistics show that the Red Sox starters had roughly league average FIP numbers prior to September, but that their xFIP numbers were below average (generally, well below average) every month except August. In fact, from an xFIP standpoint, September was not a particularly bad month for the Red Sox starters – May was a worse xFIP month, and June was nearly as bad as September.

We know the difference between FIP and xFIP is in the rate that pitchers “allow” fly balls to turn into home runs. We might expect that the Red Sox starters enjoyed relatively low home run to fly ball rates in 2011, and again we would be right:

HR/FB

AL Average

10.0%

Mar/Apr

9.8%

May

7.1%

June

8.7%

July

10.2%

Aug

11.7%

These numbers may be somewhat misleading, as fly balls tend to turn into home runs more often as the weather gets warmer. So for example, the Sox starters’ 9.8% HR/FB rate in March and April was actually fifth worst in the AL, while the seemingly tiny May rate of 7.1% was only fifth best in the league, and their higher 11.7% August rate (with batted balls evidently travelling further in the August heat) was sixth best in the AL.

To be honest, the HR/FB rate achieved by the Sox starters in 2011 does not seem to be that far from ordinary, but it was good enough to improve the starters’ FIP numbers (compared to xFIP) by about 6% between March and August. The advanced statistics tell us that the Red Sox starters benefitted from a below-league average number of fly balls that left the yard.

What We Can Learn From SIERA and tERA

While our analysis of FIP and xFIP shows that the Sox starters got a boost from below-league average BABiP and HR/FB between March and August, it’s more difficult to draw a similar lesson from the SIERA and tERA numbers.

As noted above, SIERA focuses more than the other advanced statistics on strikeout rate. Let’s take a look at the Sox starters’ monthly SIERA compared to their collective strikeout rate:

SIERA

K/BB

AL Average

3.80

2.27

Mar/Apr

4.15

1.92

May

4.32

1.74

June

4.05

2.16

July

3.91

2.61

Aug

3.68

2.51

The SIERA – K/BB comparison confirms what we might have expected: the Sox starters’ SIERA numbers were best in July and August, when the rotation struck out more and walked fewer opponents.  But in no case can we say that the Sox starters were particularly good at punching out batters or avoiding free passes.  Consider that the White Sox starters produced an AL-best 2.87 K/BB in 2011, and that even the Yankees’ patchwork starting rotation managed a 2.47 K/BB over the 2011 season.  The Red Sox starters produced pedestrian K/BB numbers, explaining in some part their pedestrian SIERA numbers.

Finally, let’s look at the tERA numbers for the Sox starters. Since tERA focuses on how hard opposing batters are able to hit opposing pitches, the logical thing to do is to compare tERA to the batted ball statistics produced by FanGraphs:

tERA

LD%

GB%

FB%

AL Average

4.35

19.4%

43.8%

36.8%

Mar/Apr

4.43

15.6%

42.3%

42.1%

May

4.20

15.7%

43.8%

40.5%

June

4.06

13.7%

42.0%

44.3%

July

5.09

22.9%

37.7%

39.4%

Aug

4.61

19.9%

46.6%

33.5%

There’s not much we can learn from this comparison. True, we can see that the spike in line drives allowed by the Sox starters in July links up to the spike in the starters’ tERA that month. But a mystery remains: the tERA numbers shown above are generally below league average, but for the most part the batted ball numbers above look reasonably good. In particular, the Red Sox starters allowed fewer line drives than league average in March through June, but this does not seem to have produced correspondingly good tERA numbers.

But the 2011 tERA numbers for the Red Sox starters do tell us something: like the FIP numbers, and particularly the xFIP and SIERA numbers, the tERA numbers confirm that even prior to September, the Sox starters performed in 2011 at below league average.

The Collapse

So far, we’ve seen that the advanced statistics tell a story that contradicts the popular narrative: even when things were going well for the Red Sox, their starting rotation was mediocre, or worse than that. But fortunately for the Red Sox, other factors (such as the starters’ low BABiP and HR/FB rate) ran in the team’s favor, and the starters were able to produce above-average ERA numbers, good enough to support the team’s 81-41 run from mid-April to late August.

Then came The Collapse.

ERA

FIP

BABiP

xFIP

HR/FB

SIERA

K/BB

tERA

LD%

GB%

FB%

AL Average

4.08

4.05

0.291

4.03

9.8%

3.80

2.27

4.35

19.4%

43.8%

36.8%

Mar/Apr

3.83

4.40

0.262

4.38

9.8%

4.15

1.92

4.43

15.6%

42.3%

42.1%

May

4.14

4.21

0.286

4.62

7.1%

4.32

1.74

4.20

15.7%

43.8%

40.5%

June

3.95

4.34

0.235

4.50

8.7%

4.05

2.16

4.06

13.7%

42.0%

44.3%

July

4.54

4.21

0.326

4.13

10.2%

3.91

2.61

5.09

22.9%

37.7%

39.4%

Aug

3.97

4.04

0.291

3.76

11.7%

3.68

2.51

4.61

19.9%

46.6%

33.5%

Sept

7.08

5.19

0.340

4.55

13.8%

4.27

1.66

5.37

18.0%

44.9%

37.1%

Sox SEASON

4.49

4.36

0.289

4.31

10.0%

4.06

2.05

4.60

17.7%

42.9%

39.4%

We’ve already noted that the Sox starters produced their worst FIP and tERA numbers during the month of September, and that their September xFIP and SIERA numbers were not much better. Clearly, September was a bad month for the starting rotation. But note that according to the advanced statistics, the Red Sox starting pitchers were not that much worse in September than they’d been earlier in the year. Per FIP, the starting rotation was about .7 runs worse in September than they’d been in June; tERA sees a wider 1.3 run gap between June and September, while xFIP and SIERA see a much smaller drop in performance.

The critical number, as we’ve already noted, is that the Sox starters’ ERA was over 7 in September. This is an appallingly bad number, the worst monthly ERA in Red Sox team history (minimum 20 games). You cannot explain a historically horrible 7.08 starting pitching ERA solely with advanced statistics like a 5.19 FIP or a 4.27 SIERA.

Instead, take a look at the Red Sox starters’ BABiP in September: .340, more than 100 points higher than their June figure. A .340 monthly BABiP is about as bad as it gets for a starting rotation (the Rockies’ starters had a .343 BABiP against in August; this is the highest BABiP I’ve been able to find against a starting rotation over the past two years). Plus consider the starting rotation’s HR/FB rate in September: 13.8% 13.8% is not a historically high HR/FB rate – the Blue Jays’ starting rotation saw September fly balls leave the park at an 18.6% rate – but this was the worst HR/FB rate experienced by the Red Sox starters since August of 2009.

The numbers here tell a story that I have not seen widely reported: until September, the Red Sox starters enjoyed low BABiP and HR/FB numbers that helped turn their subpar advanced statistical numbers into a better than average team ERA. But in September, these BABiP and HR/FB numbers turned savagely on the Red Sox, turning what would have been a bad pitching month into an historically bad pitching month. The result was the The Worst Collapse In Baseball History.

Measuring Luck

I’ve argued here that the Red Sox starting pitching was pretty bad for all of 2011, and that the September “collapse” of the rotation was caused in part by sharp rises in BABiP and HR/FB. But what caused the inflation of these BABiP and HR/FB numbers? Poor mechanics? Lack of desire? Rotten physical conditioning? Chicken and beer?

How about good old fashioned bad luck?

Advanced statistics like FIP, xFIP, SIERA and tERA were designed in part to eliminate the luck factor from pitching analysis. The factors excluded from these advanced statistics, like BABiP and HR/FB rates, are looked at in some circles as luck statistics. (See also here, here, here and here.) Baseball analyst Jeff Zimmerman at RotoGraphs has even combined these statistics to create what he calls simply “Luck”.

The favorable BABiP and HR/FB numbers suggest that the rotation was relatively lucky in 2011 (prior to September, that is). But how lucky? Some statistics gurus like to measure luck by looking at the difference between a pitcher’s ERA and his FIP or xFIP numbers. Where ERA is better than FIP or xFIP, a pitcher may be said to have “outperformed his peripherals”, and where the numbers are reversed, the pitcher may be regarded as having “controllable skills” better than his ERA would indicate. In other words, it’s lucky when ERA is lower than FIP or xFIP, and unlucky when the situation is reversed.

Measured either by ERA minus FIP or ERA minus xFIP, the Red Sox starters’ numbers for September 2011 represents the unluckiest month for any AL team’s starters in 2011.  The Sox starters’ ERA – FIP gap was 1.89 in September, a full .43 points greater than the next unluckiest monthly result in 2011 (the White Sox’s 1.44 gap in September).  Overall, it was unusual in 2011 for any team to have a monthly ERA minus FIP gap greater than 1 – it happened only seven times all year. The Red Sox starters experienced an even greater ERA – xFIP gap in September: 2.53, again the highest monthly gap for any AL starting rotation in 2011. Only one other team, the Orioles in July of 2011, experienced a monthly ERA – xFIP gap greater than 2.

I’ve said that prior to September, the Sox starters were relatively lucky in 2011. How lucky? Lucky enough to make up for their drastically unlucky September. For all of 2011, the Red Sox starters had an ERA – FIP gap of only 0.15, and an ERA – xFIP gap of only 0.07. That put the Red Sox starters, luck-wise, at about the middle of the pack in the American League: ninth best team starting luck as measured by ERA – FIP, and eighth best starting pitching luck as measured by ERA –  xFIP.

Conclusion

The conclusions I’ve reached in this piece are (1) the Red Sox starters were a below average bunch, not just in September, but all year long, (2) the success of the Red Sox starting staff prior to September was fueled by lucky low BABiP and HR/FB numbers, (3) The Collapse was caused in considerable part by a dramatically bad turn in these luck numbers and (4) measured by a season’s worth of statistics, the Sox starters were neither lucky or unlucky – they performed at roughly their true ability.

Are you buying all this? Maybe not. There’s much in the above analysis that can be debated. For example, there’s controversy over whether BABiP is a pure luck statistic, or whether pitchers have some control over whether balls hit into play turn into hits. Ditto for HR/FB rates.

But I’ll throw out one more statistic for you to consider, one that’s typically used along with BABiP and HR/FB to measure a pitcher’s luck or lack of luck. The statistic is LOB%, the percentage of runners that a pitcher “strands” on base. Remember our discussion of BABiP, where I argued that pitchers have little control over when a batted ball turns into a hit or an out? By the same analysis, a pitcher has little control over whether he allows hits in bunches (where runners are more likely to score) or is able to spread out the hits he allows (in which case runners are more likely to be stranded). A typical LOB% is around 70% to 72%; anything higher is considered in some circles to be lucky, and anything lower might be thought of as unlucky.

Here are the 2011 monthly LOB percentages for the starting rotation of the Boston Red Sox:

LOB%

AL Average

71.8%

Mar/Apr

76.2%

May

72.7%

June

70.8%

July

71.1%

Aug

70.5%

Sept

58.7%

Sox SEASON

69.9%

The Sox starters’ September, 2011 LOB% of 58.7% is the lowest (read: unluckiest) such percentage I’ve found on record for any American League team since the Tigers suffered a 56.7% LOB rate in September of 2008.

You can find more of Larry’s rational musings over at It’s About the Money Stupid, a blog about the New York Yankees.  You can also follow him on Twitter @larrybehrendt