Aaron Judge sees more borderline pitches called as strikes than any batter in the major leagues. So far in 2018, Judge has seen 1,449 pitches in the lowest part of the zone or below the zone. According to Statcast, 7.9% of these pitches, 114 total, were called as strikes. The average MLB batter sees 2.9% of these pitches called as strikes. Judge leads the league in both called strikes and called strike percentage on low pitches. If these pitches were called strikes at league average rates, Judge would have seen 72 fewer called strikes in 2018 alone.(1)
This umpire bias is unfair and must be corrected. Umpires are human, and will often call pitches wrong. As long as those errors are randomly distributed, luck evens out. When errors are correlated by batter, MLB umpires are systematically punishing that batter.
How much damage is the strike zone doing to Aaron Judge’s performance? After all, he’s got a spiffy .274/.393/.547 batting line (.398 wOBA). I’ve run the numbers, and the answer is clearly: way more than you think.
Like all MLB hitters, Aaron Judge is better when he is ahead in the count. Here are his wOBA splits by count:
|Aaron Judge Career Splits by Count|
|Through 3 – 0||0.592|
|Through 3 – 1||0.556|
|Through 3 – 2||0.440|
|Through 2 – 0||0.576|
|Through 1 – 0||0.516|
|Through 2 – 1||0.463|
|Through 1 – 1||0.380|
|Through 0 – 1||0.293|
|Through 2 – 2||0.317|
|Through 1 – 2||0.270|
|Through 0 – 2||0.223|
It’s hard to believe, but the difference between a called ball and strike on the first pitch is the difference between Aaron Judge hitting like Barry Bonds or Ronald Torreyes. The count has a huge impact on the game. We can calculate the impact of moving between two counts by subtracting the predicted count (if the pitch is called a ball) from the called count (if it is called a strike). So, if the count is 1-1, and a borderline pitch is called a ball, Aaron Judge enjoys a 2-1 count and can be expected to have a 0.463 wOBA. If the pitch is called a strike, Aaron Judge is stuck with a 1-2 count and an expected wOBA of 0.270. The impact of the called strike is thus 0.193 wOBA.
What is the total impact of 74 fewer called strikes in 2018 on Aaron Judge? To do this, we need to do the same difference exercise and distribute it over the total counts that Judge has seen. For example, Judge has come to the plate 338 times, so he has had 338 opportunities to move to either a 1-0 or 0-1 pitch by a called strike. However, he has seen only 56 0-2 counts, so he has had only 56 opportunities to move to either a 1-2 count or 0-3 strikeout by a called strike.
|Counterfactual Result||Called Result||Impact|
Aaron Judge has a 0.403 wOBA so far this season. If Aaron Judge has seen 72 additional called strikes than the average batter, and each called strike on average is worth 0.233 points of wOBA, Aaron Judge’s total wOBA should be 0.05 points higher, or 0.453. If you think that Aaron Judge has seen fewer than 72
These results estimate a massive impact on Aaron Judge and the Yankees. Judge is having a strong All-Star-caliber season with his 0.403 wOBA. However, he should be having an all-time great offensive season on the level of 2018 Mike Trout or Mookie Betts.
In fact, the results may underestimate the impact of umpire bias. Pitchers strategically pitch Aaron Judge low. Part of this behavior may be caused by how Aaron Judge is deadly on high pitches. However, part of it is likely caused by pitchers knowing that they can shade Judge a little bit low and still get called strikes. Furthermore, it doesn’t account for pitches where Judge swings at a low pitch knowing that it has a higher probability of being called a strike than other low pitches, or overall swing changes that Judge makes to adjust to the distribution of pitches he sees.
Either way, these results are outrageous. MLB’s umpires are kneecapping one of the game’s best hitters. While I’ve seen no evidence that they are doing so intentionally (nor reason to believe they consciously would), I’ve also seen no evidence that they are working to correct their errors. In fact, Judge’s strike zone expanded from 2017 to 2018.
The Yankees and Aaron Judge haven’t made a big public stink about this problem. How could they? Both would risk retaliation from umpires defending their territory. Only MLB can step in and fix this problem. Fans and the media should make a stink for them because baseball is being denied the dominance of one of its best hitters because umpires aren’t adjusting to an abnormally tall player.
(1) All Statcast data as of 6/25/2018, all other data as of 6/26/2018. The differences should be negligible.
(2) I use wOBA here for ease of comparison to other data, rather than BP’s True Average. The results should be similar.
(3) I assume 72 extra called strikes for Aaron Judge due to umpire bias, but it is possible that Aaron Judge’s height creates a slightly smaller borderline pitch area than other players (instead of 2.9%, some larger proportion of pitches should be called a strike for Judge versus the average player).
(4) Called strike data are from Statcast. wOBA splits are from Fangraphs. PA% is from Baseball Reference splits.
(5) Called strikes that result in a strikeout are assigned a wOBA of zero. Called balls that result in a walk are assigned 0.69 wOBA, per the Fangraphs glossary.
(6) It’s worth noting that additional called strikes against Judge are driven overwhelmingly by pitches thrown in the strike zone according to Statcast. These are frequently called balls incorrectly by umpires for most batters. Judge gets fewer inaccurate calls in this zone. Thus, the zone is larger for Judge than other players. However, this is because umpires are more accurate for Judge than other players. The result is still unfair since the strike zone should be the same for all players.
(6) Here, I assume that over-called strikes against Aaron Judge are randomly distributed during his PAs. It is plausible to assume that the bias might be asymmetric (when Judge is ahead, more strikes get called, when Judge is behind, fewer) or that low pitches come during different counts, but for mathematical simplicity, I do not make this adjustment.
(7) These results aren’t peer-reviewed, so I’ve created a Google Sheet with all of my calculations here. I encourage people to check my math. If you prefer different assumptions about the number of extra called strikes, you can change that assumption in the spreadsheet and report different results.
(8) BR’s splits do not allow me to estimate repeated counts in one plate appearances with the same count (two-strike counts after foul balls), so the PA% column underestimates the proportion of times Judge has seen those counts. Given the huge differences between a called strike and ball in a 2-strike count, this likely results in an underestimation of the impact of called strikes on Judge’s performance.
Photo credit: Kim Klement / USA TODAY Sports