Measuring a Catcher’s Influence | Fire Brand of the American League

There’s been a lot of talk throughout Red Sox Nation both in the media and the blogosphere about who should catch for the Red Sox: Jason Varitek or Jarrod Saltalamacchia. On Wednesday, Peter Abraham at the Boston Globe weighed in with some interesting thoughts on the subject:

“Jarrod Saltalamacchia started seven of the first eight games this season. Today will mark the fifth time in the last nine games that Jason Varitek will start.

The Sox aren’t giving up on him. But clearly they’re edging away from him. His throwing has been erratic, he is hitting .194/.256/.222 and the pitchers have a 7.14 ERA when he is behind the plate. That is more their fault than his and we’re looking at a small sample size. But when Varitek has a 2.40 ERA, it makes you wonder.”

Before I go any further, I want to point out that Abraham and I agree on two things. One, the pitching staff has pitched better with Varitek catching than with Saltalamacchia. That is a fact I can’t dispute. Two, the pitchers (and to an extent the defense) are far more at fault for their 7.14 ERA than Salty. This is especially true when you consider the incredibly small sample size involved with both catchers so far (which Abraham admits). Other than that, it’s unclear where he stands on the subject of catcher influence. He doesn’t really delve into the topic too deeply, which is fine. It’s not really necessary for him to do so.

My opinion on the subject has varied greatly over the past couple of years. There was a time when I blindly drank the Varitek Kool-Aid. Honestly, it was difficult not to. The man had caught four no hitters*; led a staff that won two World Series; and was widely revered by pitchers, coaches, fans, and journalists to be a master with game preparation and game calling. His resume was nearly flawless. While I won’t go as far as saying that Varitek was something of a God, he certainly was considered to be a superhero of sorts—at least in some circles. He was a leader, a captain, and to some an indispensable cog in the pitching staff’s success.

* To be fair, the defense deserves just as much credit as Varitek for those four no hitters.

At some point in the last year or two, I grew tired of drinking the Varitek Kool-Aid. The anecdotal arguments I’d previously devoured soon became hollow and unfounded. I still respected Varitek for his desire, leadership, and game calling abilities, but I became skeptical of their meaning. For years, we’ve been told Varitek was a great catcher, but how do we really know for sure? How were his skills measured? Was the metric used consistent and repeatable? Is the metric comparative across various seasons, leagues, and era? How does Varitek stack up against other catchers? If we’re going to truly know Varitek’s value as a catcher, we need to be able to answer each of these questions. If we can’t (and to this point, we really can’t), why are we blindly accepting someone else’s opinion as fact? Yes, I understand. The opinions on Varitek are widely held. Still, that doesn’t mean their opinion is right. Who’s to say they’re not incredibly wrong?

Take Derek Jeter for instance. For years, we were all under the impression that Jeter was a tremendous defensive shortstop. He not only fielded the position in a way that was pleasing to the eyes, but we were inundated with information confirming what we saw. We never questioned anything, and took everything at face value. It wasn’t until defensive metrics came into vogue a few years ago that we learned Jeter was not only a poor defensive shortstop, but one of the worst in the game—by a lot. We found that our eyes had lied to us. I don’t want to say he didn’t make spectacular looking plays (he certainly did), but a lot of his acrobatic plays were the result of his poor range. His “unbelievable” plays were actually routine plays by average to above average shortstops; and the plays that average and above average shortstops made look spectacular were the ones Jeter didn’t get within two steps of fielding. The point I’m trying to make is that just because we think we know something, it doesn’t mean that we actually know it.

Some people like to use a metric called Catcher ERA (or CERA) to measure a catcher’s influence/performance, but even that’s incredibly flawed—for a couple of reasons actually. For starters, like its big brother ERA, CERA doesn’t account for defensive influence. With this being the case, how can we use CERA as an accurate measure for catcher performance when we’re really measuring the performance of others? (The catcher is approximately 1/8 of the defense; by catcher defense, I mean fielding and throwing.) We can’t. I supposed we could use Catcher FIP, which would at least strip out the defensive influence. Even then we wouldn’t be measuring the catcher’s performance. Instead, we’d be measuring how a pitcher performs when pitching to a particular catcher. While that might sound like the same thing, the two concepts are very different.

Like hitting, pitching, and defense, the best way to evaluate a catcher is through analyzing his skill set, not his outcome based performance. According to Keith Woolner in Baseball Prospectus’s spectacular book titled Baseball Between the Numbers, a catcher can influence a pitcher in at least ten ways (quote taken directly from the book):

He can study the opposing batters and call for the right pitches in the right sequence.
He can use his glove and body to frame incoming pitches to subtly influence the umpire to call more strikes.
He can be attuned to what a pitcher wants to throw, or what pitches he is throwing well, and keep his pitcher comfortable.
He can control the tempo of the game, calling pitches quickly when a pitcher is in a groove, or slowing things down by heading out to the mound for a quick meeting.
He can monitor a pitcher’s emotional state and use leadership and psychological skills to help a pitcher maintain his focus.
He can be skilled at blocking balls in the dirt so that the pitcher is not afraid to throw a pitch low with runners on base.
He can watch for signs of fatigue and work with the manager to decide to make a pitching change before the game gets out of hand.
He can engage in conversation or actions to distract the batter while staying within the rules of the game. A distracted batter is less likely to get a hit.
He can remain aware of the game situation and call for an unexpected pitch for the situation, gaining the element of surprise.
He can prevent opposing baserunners from stealing, either by throwing them out or keeping them from trying to steal at all.

While one could find a way to measure skills like game calling, pitch framing, ball blocking, controlling baserunners, and controlling tempo (although we can’t measure the intent of the tempo), the other skills mentioned are very difficult, if not impossible, to measure because of their psychological nature. Take skill number five, for example. How can one objectively evaluate the manner in which a catcher monitors “a pitcher’s emotional state and uses leadership and psychological skills to help a pitcher maintain his focus?” I suppose one could compare a pitcher’s performance prior to a mound visit against his after the mound visit. Even then, we’re making a lot of “out of context” assumptions about the pitcher’s performance that may or may not be true. It’s possible that the pitcher was suffering from rotten luck on balls in play prior to the mound visit; the improved performance after the mound visit could be merely coincidental (i.e. regression to the mean). It’s also possible that though no fault of the catcher, the pitcher pitches even worse after the visit.

Look, I’m not saying that a catcher can’t have a psychologically calming affect on a pitcher. I believe he can. My problem with the psychological aspect is two fold: (1) it’s impossible to measure; and (2) even if there was indisputable evidence that Catcher A had an unparalleled ability to calm his pitchers; it doesn’t necessarily mean said pitchers’ superficial performance would improve. There are too many variables at play that can cancel out the catcher’s effect; the biggest being defensive performance and a hitter’s ability to adjust.

This brings me to my second issue with CERA. As a metric, CERA doesn’t directly measure any of the skills Woolner described above. It doesn’t tell us about a catcher’s ability to call a game, frame pitches, block balls thrown in the dirt, etc. As a result, using CERA (in its current form) to either evaluate a catcher’s performance or justify his playing time is not only inappropriate and misleading, but also somewhat reckless in the analytical sense. Outcome stats like ERA and CERA correlate very poorly for year-to-year; therefore, performance in this area is pretty variable. Instead, by focusing on the skills, we can more accurately project a catcher’s value and future performance, rather than take his current performance at face value. Using this line of thinking would go a long way towards determining whether or not Varitek or Saltalamacchia deserves more playing time behind the dish going forward. As it stands right now, we’re forced to rely on evidence that’s almost entirely subjective and potentially flawed.

While there have been a few exciting and promising studies as of late (including one by Mike Fast of Baseball Prospectus and another that was published in the 2011 Hardball Times Annual) expressing that a catcher has a measurable impact on his pitching staff, those studies are still in their infancy. Until further studies have been completed not only confirming the initial results, but also providing an evaluative metric, people will continue to talk out of the ass pretending to know something they can’t possibly know. (Actually, even after a well-tested and proven metric is release, people will still probably try to force their subjective, unfounded opinions onto the general public.)

I do hope that something good comes of these studies, because I’d like to finally lay this argument to rest once and for all. While I’m of the opinion that a catcher can influence a pitcher or pitching staff in some manner, I think most people grossly overvalue and overestimate the criticality of that influence. I don’t know how much influence a catcher has on a pitching staff, and I don’t pretend to know. For all I know, my opinion could be wrong—and that’s the point. Just because we think we know something, it doesn’t mean we actually know it.