The Cold WAR

An attempt to re-visit the true meaning of "wins above replacement."

There is a palpable divide amongst baseball fans these days about the value of advanced statistics. Within this conflict, fans find themselves in one of two camps: either they fully embrace the use of sabermetrics, or they are unquestionably against them.

But there are others, notably writers whose job it is to be up-to-date on all the latest information about baseball, who find themselves somewhere in the middle. They want to use any kind of statistic or nuance that enhances their understanding of the game they love, but they are also hesitant to simply abandon century-old norms for relatively new ideas. That’s just human nature.

A couple of days ago, my former colleague and all-around good guy Mike Hurley, a sportswriter for CBS Boston, wrote a piece called, “Time For Baseball’s WAR Supporters To Tone Down The Arrogance.” The column was in response to an article by Sam Miller on entitled, “WAR is the answer.”

WAR, or “wins above replacement,” is arguably the poster child for sabermetrics, and has been at the forefront of the old- vs. new-school stats debate. It has been a fairly mainstream stat for the past few years, but was raised to prominence during the 2012 American League Most Valuable Player race. WAR supporters pined for rookie sensation Mike Trout, who led all of baseball with 10.7 rWAR (Baseball Reference), while traditionalists sided with Miguel Cabrera, who won the AL Triple Crown despite finishing just fourth in the league with 6.9 rWAR. Cabrera ended up winning the MVP, and pretty easily at that.

The overall discourse between fans, bloggers, columnists and the general baseball community could lightly be described as “heated” during the months surrounding the MVP vote. At times, it was downright childish.

And that’s where I agree with Hurley, first and foremost. The arrogance – on both sides of the argument – could definitely use some “toning down,” as Hurley says. I know that I’m guilty of it, as someone who supports advanced statistics. I often find myself riled up when I find a weak spot in an argument by someone on the other side, and have too often taken the low road of mockery, rather than calmly trying to advance my point and hope for the best.

But I think that’s a separate point for another day, probably something a therapist could write about more eloquently than me or any other sportswriter. Emotions are a part of arguments; that’s nothing new.

What I really want to address in this piece is a point that I feel is too often lacking from the overall argument about not just the value of WAR, but more specifically, the rationale behind it.

One of my favorite blog posts in the history of the Internet – and one that I always send people who debate the merits of WAR to – is called “Everyone has their own WAR,” by Tom Tango. The post was written in response to an interview with’s Mark Simon on

Tango’s article is short, but it provides an eye-opening rationale to the basis behind WAR. The idea is that multiple people can rate baseball players using any kind of metrics they choose, including intangibles such as “heart” or “leadership.” But at the end of the day, everyone ends up at one final value for each player, and those players are ranked based on which is better than the other.

We all come up with our “single number”, even though we kick and scream that we shouldn’t come up with a single number. If one guy argues that Felix is better than Lincecum, and the other argues the opposite, then guess what: they’ve each “smushed” a bunch of parameters, considerations and gut feelings to get to their final opinion.

No one is telling you not to overweight or underweight strikeouts or HR. But a system requires you to spell out the rules for weighting, and apply that consistently to everyone.

The one good thing about the case-by-case basis is that it forces you to think about parameters. You’d like to ding Manny Ramirez a little, you’d like to up Jeter a little. So, you have to create a “heart” parameter. And that’s perfectly fine! Just spell it out that that’s what you are doing. And tell us how much you are giving to each player for heart. I have no problem with giving out wins for heart, over-and-above whatever his actual performance tells us. Just spell it out and be consistent.

The point is that in order to give credence to qualities such as “heart,” there needs to be a baseline for what they are worth, and there needs to be comparative information about every player in the entire league in order for it to be properly tested.

For instance, David Ortiz has been labeled by many as one of the most “clutch” hitters in all of baseball, thanks in large part to his heroic performance in the 2004 American League Championship Series. There’s no doubt that Ortiz came up huge in that series, and is one of the main reasons why the Red Sox ended up as World Series champs that year.

In fact, there are several other instances in which Ortiz has been clutch, including his string of walk-off home runs in 2006. But does this mean that Ortiz is innately clutch – that he has an internal quality that allows him to come up big in important situations?

I don’t claim to know the answer to that question, but I won’t totally dismiss the idea that someone can be born with or develop an ability to thrive in tough spots; I also won’t embrace it as truth, because there’s no documented evidence to prove it exists.

More importantly, though, I don’t think that I am in a position to say that Ortiz is a more clutch hitter than anyone else. Think about it: Ortiz did come up with several big hits in several key situations, but was it not by incredible fortune that he even had the opportunity to do so? Is Ortiz more innately clutch than, say, Adrian Gonzalez, who has been a wonderful hitter throughout his career but hasn’t had many chances to bat in the playoffs?

Again, I don’t know the answer to that question. All we know about things like “clutch” or “heart” is that they are completely subjective qualities – even someone who recognizes their value will admit that.

Until we have the ability to accurately measure and document these intangibles for every single player in the league, I don’t feel comfortable using them in my overall analysis. That is why I prefer WAR.

Of course, WAR is not perfect. The defensive metrics used in its calculations are far from reliable, though they are coming along. And WAR also does not incorporate “win probability added,” which I tend to feel is somewhat relevant in analyzing true value (WPA alone is a useful stat, though).

But that does not mean that WAR isn’t the best we’ve got so far. In fact, according to Miller’s article, “In 2012 the correlation between Baseball Prospectus’ WAR and team victories was 0.86 (where 1.0 would have meant a perfect correlation).”

I only took one year of AP Statistics in high school, but even I remember that that’s a pretty damn strong correlation, especially for a statistic that many call “unreliable.” It’s not enough to embrace the stat as gospel, but it is enough to raise an eyebrow when there is a vast difference between two players’ WARs (i.e., Trout and Cabrera in 2012).

I guess the point I’m really trying to drive home is that while WAR as we know it is based on tested calculations of numbers – ones that I and many others trust – the true spirit of the stat is to assess each player’s overall value. It’s something fans have argued about for decades.

And there’s still plenty of room for debate. But if you are to question WAR’s merits, I only ask you this: What system do you have that has been proven to be better?

Categories: 2004 ALCS 2004 World Series David Ortiz Miguel Cabrera Mike Trout

As the resident non-Red Sox fan of this blog -- my allegiance lies 300 miles South in Philadelphia -- I aim to provide completely objective analysis without letting my heart or any of my other organs get in the way. "Objects in the mirror are closer than they appear." Think about it.

3 Responses to “The Cold WAR” Subscribe

  1. Mr Punch February 22, 2013 at 4:59 PM #

    The "anti-clutch" argument, as articulated for example by Bill James, has always struck me as basically discreditable to sabermetrics. Don't we all know from personal experience that some people perform better under pressure than others? Granted that successful pro athletes will tend to be above average in this regard, there still must be variation (and if there isn't, then they all have superior character in this sense at least). So if analysis can't find "clutch," that must reflect a flaw in the methodology. My own guesses are that (1) the relatively new use of specialist relievers (the LOOGY/ROOGY) in clutch situations has greatly confused the issue, and (2) a great deal of "clutch" is whether or not a hitter "can be pitched to" — so that, for instance, a very good but very selective hitter is likely to seem "less clutch" than a guy who will hit anything.

  2. Gerry February 23, 2013 at 5:26 AM #

    I have found that stats lie 22.4% of the time. I have witnessed marketers, CEO' and especially CFO's use stats to vigorously state or defend their positions, knowing what facts they have omitted or emphasized or fabricated. Numbers are notoriously UNreliable based on who uses them and to what ends.
    I read an argument recently about a pitcher whose final #'s, including WAR, were poor. Yet, the numbers were skewed by a few awful performances while hurt (not Lackey, but his numbers while pitching with shredded elbow is another example.). One side said this pitcher is valuable considering the whole story despite the #'s. The other side
    insisted the whole story is the #'s.

    Despite my firm belief that metrics are critical to understanding baseball, its players and staffs, I know full well from decades of experience with people and institutions who massage #'s to their own ends. Numbers really don't tell the whole story, nor are those numbets presented in baseball discussions necessarily accurate.

    You yourself said these metrics are still in development. They truly are. My favorite example of this is the metric mud that was slung at Ellsbury regarding his UZR, which called him out as one of the games worst in CF, despite yhe true fact that UzR was still in its early developmental phases, and that the developers of UZR at the time said SSS gives inaccurate readings, which it did. Yet the Red Sox sabermetric community was so ready and willing to paint that negative picture of Ells. Result? Not only bad vibes which effected his relationship with fans and the team, but enough mud still sticks to damage his negotiating position. THAT is arrogant, irresponsible and unprofessional. And that (arrogant) certitude is part of the reason many have resisted these metrics. You can tell me with that certitude that W-L, RBI, BA, ERA are irrelevant, and I understand fully why you say it, but bottom line they actually are relevant and important in the scheme of things.

    Baseball writers and fans aren't universally tone deaf. OBP, OPS, SLG, BAPIP, WAR, UZR, etc. have gradually infiltrated articles and
    discussions. But unless and until sabermetricians stop demanding that familiar and useful stats like ERA and RBI be replaced with the new stats, despite still being in development, homeostasis will win out. NO one listens to rabid crusaders … never have, never will, because they believe they alone have the truth. Even sabermetricians disagree with the Shredder. A little PR, honey, understanding and even honesty will go a long way towards a more general acceptance of advanced metrics. Until then it will be us against them, not much different emotionally from the rigidly uncompromising and often wrong tea party vs the world. Time to hire a marketing consultant and get sign some sexy #'s????


  1. Your Questions About Why Do My Ears Ring | Ringing Of Ears - February 22, 2013

    [...] 10 Things Part IIThe Many Essence Of The Heart DefibulatorThe Cold WAR [...]