Before I begin, I have to let you know that I have no ill will toward the Boston Globe. In fact, I enjoy pretty much everything I read by Peter Abraham, Nick Carfardo, and Chad Finn. Unfortunately, there are certain writers (cough* CHB *cough) on their staff that really get under my skin. Today, my irritation is directed at baseball columnist Christopher Gasper. On Thursday afternoon, he posted an article lamenting the infiltration of advanced statistics into sports. While he expresses some positive overtones to their existence and availability, the mood of his article is overtly negative and crotchety. In order to keep Mr. Gasper honest, I’m going to give him the same treatment I gave to his colleague Mr. Shaughnessy last Friday–I’m going to break him down Fire Joe Morgan style.
Sports discourse has become strictly a numbers game. The only acceptable way to make a point in a sports discussion these days it seems is with a decimal point.
Whatever happened to the good, old-fashioned eye test or context? Formulating an opinion has been replaced by formulas when it comes to dissecting and discussing the games we love. Statistics have overrun sports the same way weeds spread through a deserted parking lot.
Maybe it’s just me, but I swear these two paragraphs were written by Andy Rooney of 60 Minutes. Seriously. I’m not kidding. After reading it the first time, I half expected him to wax about life in the 1950s when economic times were better, crime was lower, and families were as wholesome as the Cleavers on Leave it to Beaver.
All joking aside, he has a point. Whatever happened to the good, old-fashioned eye test? It’s simple really. The eye test is pretty useless for accuracy and comparative purposes. While the human mind is incredibly complex, it’s also unbelievably inefficient. It has immense capabilities for absorbing and storing visual data, but it’s incapable of properly organizing it for easy recall. When memories are recalled, they’re often skewed by emotional sentiment. For example, we are far more likely to remember a diving catch by Jacoby Ellsbury or an errant throw by Jarrod Saltalamacchia in a tight game than we are recalling Marco Scutaro routinely fielding a ground ball to that’s hit to the right of straightaway short. Furthermore, if we watch ten games all season and see Ellsbury make a diving catch in each game, we’re likely to think Ellsbury is an excellent centerfielder. He very well might be, but not having watched the other 152 games; it’s possible we’re wrong in our evaluation. He may have been terrible in every other game, but our evaluation via the eye test would tell us otherwise. Additionally, how do we know how he stacks up against other CFs around the league? How do we compare what we’ve seen to what other players are doing? Is it possible our loyalty to the Red Sox is skewing our opinion?
I could go on all day asking these questions. And that’s the problem with the eye test. There are far too many variables at play in order to create an accurate method of evaluating players. Even if you were to watch every play of every game by every team, you’re mind isn’t capable of putting together an unbiased, consistent, repeatable method for evaluating everything you’ve seen. Furthermore, everyone’s eye test is calibrated differently. A great defensive player to you, might be an average one to me. At least when it comes to advanced statistics, most of those variables have been removed from the equation; thus creating a more accurate measure for evaluation.
If the 1950s-’60s-era debate about who was a better player Mickey Mantle or Willie Mays was happening today, fans would simply look at who had the higher OPS-plus (On base percentage-plus-slugging percentage adjusted for ballpark conditions). It was Mantle. Or they would calculate who possessed the higher average WAR (wins above replacement value). It was Mays.
Where’s the romance in that?
Oh, so he did wax about the 1950s. Imagine that. I’m totally shocked.
While I understand Gasper’s desire for romance in a game that’s chock full of really tremendous stories, it’s inappropriate to discuss those stories when it comes to player evaluation. Anecdotal evidence is great for filling in the gaps, and presenting the human side of the game. True player evaluation should be about cold, hard facts and nothing else.
Sports have become a dictatorship of digits, the province of percentages, averages, probabilities and esoteric statistics. Fans and media (yours truly included) worship at the altar of the integer. If an observation or thesis doesn’t have a numerical value then it lacks value.
It’s reached a point where no one trusts their instincts anymore . What should be self-evident is disputed by numbers.
The problem with going with your gut is that it’s frequently wrong. Anyone who’s read The Book by Tom Tango, Mitchel Lichtman, and Andrew Dolphin could tell you that much, as it’s chock full of studies disproving “conventional baseball wisdom.” Intuition, by it’s very nature, is based off of feelings, perception, and irrational thought processes; not facts. You wouldn’t invest $5000 in a stock because you had a “good feeling” about it, right? Of course not! Only a moron would do such a thing! So why would a manager make an intuition based baseball decision when the game is on the line?
Given the amount of data that’s readily available, its silly to ignore it to go with gut instinct. While the data can’t be extrapolated to every situation, it should be used as a reference when making managerial decisions.
Stats are great at dispelling myths, but they’re also great at creating them.
For example, when the Red Sox signed Mike Cameron before the 2010 season and shifted Jacoby Ellsbury to left field, an oft-cited reason was Ellsbury’s low UZR (Ultimate Zone Rating) in center field.
Baseball has long been a stat-obsessed sport and the final frontier of sabermetrics, the mathematical and statistical study of baseball, is reliable metrics that measure defensive performance.
In 2009, Ellsbury’s UZR – the number of runs he saved versus an average center fielder — was minus-9.7, one of the worst ratings among regular center fielders. Cameron’s was 11.4.
This was used as justification by some fans and media for the then-37-year-old Cameron supplanting the younger, faster Ellsbury in center.
For what it’s worth, UZR wasn’t the only defensive metric that poorly rated his performance in center field. Defensive Runs Saved (-9) and Total Zone (-6.8) were also less than impressed with his work defensively. Using Gasper’s vaunted eye test, I also evaluated him pretty poorly in CF that season. Cameron, on the other hand, had performed very well with both advanced and anecdotal methods in the seasons preceding 2010. Considering his performance excellence, there weren’t any logical reasons to assume he’d be a disaster in CF.
It should be pointed out that the Red Sox do not use UZR. They have their own advanced statistical measure of defense that has some shared principles, but is not the same.
So what’s your point about UZR then? If the Red Sox don’t use UZR, why are you tearing it down?
However, it’s worth noting that in the MVP-caliber season Ellsbury is having, restored in center field, he is suddenly among the game’s best defensive center fielders, according to the same statistical measure that once condemned him.
Ellsbury has an 11.2 UZR this season, second-best among regular center fielders.
Odds are that speaks less to Ellsbury spending his offseason addressing some sort of defensive deficiency and more to the capricious nature of the stat.
UZR has some well documented deficiencies. For starters, it tends to be unreliable for purposes of single season evaluation. In order to get a clear picture of how well a player is performing defensively, it’s best to look at a player’s performance over 400-450 games. Then again, what stat can’t you say that about?
Secondly, UZR carries a high level of variability, which most people use to discount the metric. While I understand their sentiment, it’s somewhat hypocritical. Traditional statistics like batting average and ERA are notorious for high levels of statistical variability, yet both are widely accepted metrics by the average baseball fan. Why? Familiarity. Everyone knows ERA and batting average. Everyone can calculate it. The same can’t be said for UZR.
UZR’s calculation is lengthy, in depth, and very complex. While I understand the methodology behind the concept, I couldn’t calculate it on my own. It’s esoteric nature breeds uncertainty and distrust. Combine those concerns with results that occasionally don’t match up with the eye test, and you have a lightning rod for criticism.
Let’s take a step back for a moment, and revisit the eye test. When we’re watching a baseball game, whether in person or on TV, we tend to focus on the pitcher/hitter interaction. Most of us are neither paying attention to the positioning of the fielders, nor their reactions once a ball is put into play. When a ball is put into play, it typically takes us a second or two to change our focus from the pitcher/hitter interaction to the defensive action. During that time, we miss major pieces of the defensive play that greatly affect the outcome; most notably the “jump” and the initial route.
For example, let’s assume a scenario where Carl Crawford makes a diving catch. Great play, right? Maybe. The end result was certainly positive, but did the preceding events justify the accolades? That’s an entirely different story. In a situation where he got a poor “jump,” and ran a “banana” route to get to the fly ball, the answer is no. Sure, he made the diving catch, but if he’d played the fly ball correctly to begin with, he probably wouldn’t have needed to dive. Instead, he would’ve made it look like an unmemorable routine catch. These are the kinds of tricks the eye test plays on you. We see the end result of a diving catch and assume everything about the play was great. That’s often not the case.
Another problem with numbers is that they’re only as good as their application. There is a baseball statistic called batting average on balls in play that applies to pitchers. Sabermetricians regard this stat as a measure of luck, positing that once a ball is put in play the outcome of the play is largely out of the pitcher’s control.
Based on that theory it’s simply good old-fashioned good fortune that Tigers ace and Cy Young-in-waiting Justin Verlander has a .234 BABIP and Red Sox righthander John Lackey, who earlier this season was getting hit harder than a piñata, sports a.335 BABIP.
Is there a stat to account for putting common sense in play?
I’m not going to hammer Gasper too much on this point because BABIP is a commonly misunderstood stat. While BABIP can be used as a measure of luck, it’s most useful when used in comparison to a pitcher’s expected BABIP (xBABIP). Even then, it’s not perfect. Pitchers making half of their starts in ballparks with vast foul territories will benefit from higher numbers of foul balls being converted into outs. Additionally, certain pitchers with high O-swing rates have shown a tendency to produce lower than expected BABIPs because of their ability to induce pop-ups or so-called weak ground balls.
Does this mean we should discount BABIP or xBABIP entirely? No. Years of data show that these numbers matter. Eventually, Justin Verlander and John Lackey will see their respective BABIPs regress toward the mean. It may not happen this season, but it will happen at some point. Neither number is sustainable.
Lastly, Josh Tomlin has a .253 BABIP and Roy Halladay’s is .311. Do you still want to make that argument, Gasper? BABIP is not an indication of talent.
This is not to pick on sabermetricians, who are far more intelligent than me. Bill James, a Red Sox senior adviser and the father of sabermetrics, is a pioneer who has changed the game forever and for the better. Statistics like WAR, OPS and runs created are enriching and enlightening, taking us beyond baseball card stats to explain the game in greater detail.
I don’t understand. Stats are good and not good? If advanced stats are “enriching and enlightening” and explaining “the game in greater detail,” how is that a bad thing?
Statistics will always have a place in the hierarchy of sports. It’s just cold, hard numbers can’t rule with an intractable iron fist. They’re part of the solution, not the solution itself.
A guy who knew a little bit about math, Albert Einstein, once said: “Unthinking respect for authority is the greatest enemy of truth.”
The problem with relying solely on numbers as the only distiller for sports discussion is that numbers can’t provide context.
Actually, numbers can provide context. The problem is that most people don’t know understand how to add context to statistics. Gasper’s opinion on BABIP is a prime example. Batted ball rates, park factors, swing/contact rates, etc. all add context to BABIP. Had he taken those factors into account, he might’ve found greater meaning in the results. Instead, he sees a concept that lacks common sense.
They can’t tell you that Patriots wide receiver Chad Ochocinco is struggling adjusting to a different system of route calling with the Patriots, so he might not put up “Madden ‘12’” numbers right away.
Or that Brady was a bit off his game in 2009 because he had to spend time maintaining his surgically-repaired knee and was also dealing with cracked ribs and a broken finger among other maladies.
Or that Rajon Rondo’s shooting percentage declined because he was embarrassed by an off-hand remark President Barack Obama made.
I agree with Gasper on the first two points. Context is always important. In fact, you’d be hard pressed to find a stats oriented fan that disagrees with either point. That said, no one can prove (not even Rondo) that his shooting percentage declined as a result of Barack Obama’s remark. It’s certainly possible, but it’s also possible he’s using it as justification for his poor play.
Integers will never be able to measure the intangibles – ambiguous factors that make some athletes able to perform better than others and some teams able to execute better than others.
What makes sports compelling is that it often defies order, predictability and symmetry. That puts it directly at odds with statistics, a discipline based on those tenets.
The problem with intangibles is they’re entirely subjective. We think we know the degree to which they benefit players, but we don’t atually know. Jason Varitek’s intangibles behind the plate are a perfect example. Do you know how much his leadership helps the pitching staff when he’s catching? I’m sure it has some effect, but how much? Does it save three runs a season? 15? 30? We don’t know because we can’t measure it. This doesn’t mean we should discount the intangible factor entirely, but we need to be weary about overvaluing it.
So, if the only way to have a sports argument in the 21st century is to rattle off a bunch of numbers as evidence, then count me out.
Would you get into a political debate with someone using anecdotal evidence that can’t be proven? Of course not. Why would you get into a debate about sports (or anything else for that matter) without hard evidence? Feel free to bring your stories along to help shape your argument, and make your discussion more interesting. If you want to win though, you better bring the facts, otherwise you’re probably going to get embarrassed.