Using xBABIP and IF/F to predict some Sox’s 2009 numbers

A month ago, Chris Dutton and Peter Bendix collaborated together to come up with a new statistic — xBABIP.

BABIP, or Batting Average on Balls In Play, is generally used to identify whether a pitcher was lucky or unlucky. With .300 as the baseline, if a pitcher’s BABIP was higher, they could be expected to have a better year the following year (or even the following month) with the reverse holding true.

For batters, .300 is not a reliable benchline, nor is there one standard benchline across all batters. They have to be personalized, which Dutton and Bendix did. The evidence is compelling, and they provided a downloadable spreadsheet of selected players across a number of years. For example, I wrote this on Josh Bard for The Hardball Times:

Taking a peek at xBABIP (2007 numbers only available) show that his
BABIP was .321 compared to xBABIP of .316. In other words, his
.285/.364/.404 line with San Diego in 2007 may be more indicative of
his true abilities. If true, the Sox have a steal on their hands.

If you want some more information as to how exactly xBABIP works and is calculated, simply follow the link. Otherwise, let’s take a look at the Red Sox players provided in the spreadsheet that will play in 2009 and see how their BABIP correlates with their xBABIP. 

Oh, and by the way, xBABIP is meant to be used as a predictor; something one can look at to have a reasonable expectation as to what will happen in the future. It by no means insinuates that future is set in stone. Players could still under/outperform their xBABIP. There are also other variables at play — a player could pack muscle onto his frame and start hitting line drives with greater frequency, things like that. All statistics are meant to be used as a predictor for future success; it doesn’t necessarily mean it will happen.

David Ortiz looks to be in line for a monster regression to the mean… except in this case, regression is a good thing. After hitting for a batting average of .264 (career average of .287, .332 average in 2007) xBABIP sees a drastic improvement. Ortiz’s BABIP was .269 and his xBABIP was .302. In 2007, his BABIP was .352 and his xBABIP was .306.

In 2006, with a .301 xBABIP, his actual BABIP was .268 in a season where he hit .287. And in 2005, when he hit .300, his BABIP was .302 with his xBABIP at .289. What does all this mean? It means you can expect a .300 average as a reasonable forecast next year for Big Papi.

Dustin Pedroia‘s MVP season checked in at a .326 batting average which was lucky according to xBABIP (.323 BABIP to a .305 BABIP, whereas his BABIP last year was .324 with a .321 xBABIP… so for the most part he is pretty consistent).

J.D. Drew .280 seems to be what we can expect from him next year, given his BABIP of .302 and xBABIP of .300.

Jacoby Ellsbury… ah, we’re getting into some good stuff. People considered his season disappointing, hitting .280. His BABIP was .305 but his xBABIP is .326, so his batting average should rise to around .300 next year.

We can expect a regression from Jason Bay, as his BABIP was .316 and xBABIP checked in at .285. Bay’s 2007 checked in at a BABIP of .285 and xBABIP of .282. Despite a career batting average of .282, xBABIP seems to suggest Bay is actually a .247 batting average hitter. (Or is he? Read on.)

Some good news regarding Jason Varitek… yep, he should get better: .270 BABIP, .295 xBABIP. Jed Lowrie is around the norm: .323 BABIP, .314 xBABIP.

Avert your eyes… you thought Julio Lugo was bad in 2008? He was lucky! He had a BABIP of .312 and xBABIP of .284, so expect his average to drop. How about Kevin Youkilis? 2008 looks like his career year according to xBABIP, which has him at .306 when he hit for a .329 BABIP. (Yesterday, Tim looked at how Youk is becoming “selectively aggressive” and swinging at more pitches than usual, but in a way that benefits him.) The final name in the spreadsheet is Mike Lowell, who had a .278 BABIP and .282 xBABIP, so his 2008 looks like it should repeat itself. Now…

There is a flaw in xBABIP, in that it doesn’t take into account IF/F
(infield flies per fly) which was brought to my attention when I mused
that Chris Snyder could be capable of a .280 batting average next year
due to his impossibily low BABIP as related to xBABIP. An analyst in a MLB front office was kind enough to point this out to

Take a look at BOTH Snyder’s IF/F rates and LD [line drive] rates. There is a
specific reason his BABIP was low:  Not only is his LD rate below
league average, his IF/F rate is much higher than league avg.  This has
been the case for the past two years. Hitting few line drives and
hitting a lot of infield popups is going to lead to a low BABIP. (That
and being slow).

Hitting a higher than league average percentage of infield fly balls is a
sure a sign of poor contact in general. I am certain that if you run
correlation studies on this you will find this to be true, despite the
fact that there are not nearly are not as many [IF/F] as there are LD.

So, let’s check this for the players above and see who hits infield fly balls higher than the league average. I’m not sure the exact league average numbers, but the rough numbers is that the average line drive percentage is around 18 percent, while the average IF/F percentage is around 10 percent. So:

Player 2008 LD% 2008 IF/F% Career LD% Career IF/F% Notes
18.6 8.4 20.5 8
Dustin Pedroia 21.2 10 20.2 11.3
18.4 3.4 18.5 10
20.3 17.8 20.1 16.5 Skewed to
2008 statistics
16.5 7.8 18.4 6.8 9.8 in 2007
13.6 12.7 20.1 11
25.1 3.5 25.1 3.5 First year
Julio Lugo 17.6 4.3 19.2 9.7
21.9 2.1 22.5 5 IF/F lower
than 3.0 in ’07 and ’08
20.6 11.7 21.1 12.1

Okay, so what did we learn here that we can apply to their BABIP/xBABIP numbers?

For one: Jason Bay. In his 2007, the year he hit .242, his IF/F was a career high 9.8, and it’s 6.8 all told for his career, which shows that the fear injected in us by his 2008 xBABIP can be mitigated by his propensity to avoid IF/F, which naturally increases batting average.

Kevin Youkilis is a monster. IF/F lower than 3.0 for two years running? You have got to be kidding me. This means that he should consistently outperform his xBABIP (at least, until adjusted for IF/F).

xBABIP does, however, correctly predict that Julio Lugo will sink like a rock. Might be even worse, since his IF/F in 2008 was 4.3 percent in 2008 and has a career of 9.7 percent. Great.

David Ortiz? His IF/F is fairly standard, but he hit less line drives which makes sense given the power sapped by his wrists. I would expect a better season from him in 2009 than he gave in 2008, all told.

All of this was a really complicated way to say three things:

David Ortiz will return to the Big Papi we knew,
Kevin Youkilis is a monster, and,
Julio Lugo really does suck.