*As we know there is an endless amount of information to be uncovered from PITCHf/x and Statcast data. In fact, everyday it seems like there is some new and exciting piece of research that has come out which challenges the way we think about the game. In this series of posts I will go over a way to use PITCHf/x data which is almost certainly already being used by baseball quantitative departments across major league baseball. The analysis involves predicting pitches based on the previous pitch thrown.*

*Markov Chains is a model that describes a sequence of possible events, in which the probability of each event depends on the state attained in the previous event. It is also the model that we will use to predict the type of pitch pitchers will throw.*

*Considering this site covers the Yankees, some might say the following analysis would be best applied for scouting opposing teams. For instance, if Yankee hitters knew what pitch was coming next or at least knew the probability of each pitch being thrown by opposing pitchers, they could better prepare for the daily pitching matchups. Since the Yankees play a new team every four to five days it would be tedious to constantly post these pitch type probabilities here at BP Bronx. I will leave that particular analysis to the Yankees front office. Instead, I will look at the predictive pitch types of Yankee pitchers. This might actually be more beneficial to Yankee fans considering it might provide a little more insight into the strategy of the pitcher.*

*The intention is to predict the pitches of Yankee starters and as I mentioned before, the PITCHf/x data applied to the Markov Chain model holds the key to accomplishing this goal. The current rotation includes Masahiro Tanaka, Michael Pineda, Nathan Eovaldi, Luis Severino and CC Sabathia. We will explore this pitching succession one by one in order to see how effective each pitcher is at mixing their pitch types.*

*Note: PITCHf/x data is from the 2015 and 2016 season up to April 11, 2016.*

## Masahiro Tanaka

#### Overall Pitch Type Usage

Tanaka has a vast arsenal of pitches and he spreads out their usage fairly well. The chart above shows us how much he uses each pitch and while this is helpful it doesn’t really allow us to predict what pitch Tanaka will throw. The chart should mainly be used – as you will see later – as a basis to compare other pitch sequences. To predict Tanaka’s pitches I used Markov Chains to model his PITCHf/x data. The chart below shows the outcome of this applied model. It is a probability matrix where each cell contains the probability of a certain pitch being thrown based on the previous pitch. For example, the first cell tells us that when Tanaka throws his four-seam fastball there is a 23 percent chance he throws it again on the next pitch. In the cell immediately below cell one, it shows us that if Tanaka throws a splitter there is a 15.9 percent chance his next pitch will be a four seam fastball.

#### Predicted Pitch Type Probabilities

According to his overall pitch type usage, Tanaka mixes a majority of his pitches between his four-seam fastball, splitter, and slider. However, looking at his pitch type prediction chart reveals and interesting fact. If Tanaka throws any pitch other than a fastball, the chance of him throwing a splitter on the next pitch increases significantly. For example, if Tanaka throws a slider the model predicts that the next pitch type has a 43.2 percent chance of being a splitter while his overall pitch usage shows he throws the splitter only 29.8 percent of the time. It could be problematic for Tanaka if the opposing hitters are aware of this fact.

An encouraging observation is that it seems Tanaka’s four-seam fastball usage is not very predictable. Most hitters look to hit the fastball so it is important to hide this pitch within the sequence. The only slight tell that Tanaka has indicating he might throw a four-seamer is if he threw one on the previous pitch, but even then the usage ratios revert to the ones in the overall usage chart where he throws three pitches a majority of the time.

The pitch prediction matrix can tell us the likelihood of a pitcher throwing a specific pitch if we know the previous pitch type. This doesn’t help when it comes to predicting the first pitch in a plate appearance. For some aggressive hitters, this is the pitch most desired especially if the opposing pitcher has a tendency to hurl a fastball on pitch number one.

The table below shows the pitch proportions for Tanaka’s first pitch. The ratios are similar to the ones in his overall pitch usage table in the sense the three most used pitches are the four-seamer, slider and splitter. However, in this case the pitch type with the highest probability of being thrown by Tanaka on the first pitch is the four-seam fastball, not the splitter.

#### Pitch Type Probability of the First Pitch

What I found more interesting than Tanaka’s first pitch fastball is the usage of the curveball in this same count. Since the beginning of the 2015 season, Tanaka has thrown his curveball 7.3 percent of the time overall. When it comes to the first pitch he uses it more often. In fact, he more than doubles his usage of it at 16.1 percent. He uses the curveball on the first pitch nearly as much as he does his splitter and slider. The strategy behind this is probably to fool a batter who is expecting a first pitch fastball. The velocity difference between the fastball and the curveball is on average 15 to 17 mph. You can see how this would puzzle a hitter expecting a much faster pitch and how this could be useful on certain hitters the second or third time through the order.

If we take this “first pitch curveball” strategy a step further to the second pitch, we see that it leaves Tanaka open to some predictability. Going back to the pitch type prediction matrix, we can see that when Tanaka throws the curveball there is a 60 percent chance the batter will see either a splitter or a sinker on the next pitch. While these are two different pitch types they do possess a similarity which is important. If the batter is frozen by Tanaka’s first pitch curveball, he most likely knows – if his analytics department has provided him with the information – to expect a pitch that is going to dive downward. Also important, if he throws one of these two pitches – the sinker or the splitter – for pitch number two, it is almost a certainty that he will then throw one of those for pitch number three and more often than not it will be the splitter. Below is the zone profile for when Tanaka throws a splitter.

He throws his splitter out of the zone quite often. Due to this fact, you might ask why batters don’t just take the pitch. The answer is splitters are difficult to pick up. That is what makes the pitch so difficult to hit. It looks like a fastball and at the last second drops leaving the batter flailing. However, if the hitter knew it was coming they would most likely lay off. As I have demonstrated above this is possible to conclude and while the method is not 100 percent accurate it is better than just completely guessing.

By no means am I saying Tanaka is predictable or that his first pitch curveball sequence – when he uses it – is predictable. Each plate appearance is different and he or his battery mate might not stick to the same pitch sequence laid out in the probability matrix. However, one can see that at the least a batter should be able to know whether they will be getting a fastball or off-speed stuff from Tanaka after any given pitch.

I would bet some hitters use this information, but like I said, it doesn’t mean they will always have the advantage as the pitcher and catcher can mix things up at any given time. Also batters still have to hit the ball which is easier said than done. A player might know that a fastball is coming, but fail to make proper contact or even worse hit the ball hard right at somebody. Basically, it is not a forgone conclusion that hitters have a significant advantage over pitchers by knowing the probability of a certain pitch being thrown. It does, however, make it a little easier on hitters who are already at a disadvantage because – as most of us know – it is extremely hard to hit a baseball.

Without overpowering stuff, Tanaka must in part rely on his pitch type mixing. I will admit this might fall on the shoulders of Brian McCann just as much as Tanaka. Tanaka is responsible for the execution of the pitch, but McCann or even Girardi in some cases may call the pitch. Based on what the model has told us, I believe Tanaka is doing a pretty good job of stacking his pitches. He doesn’t allow hitters to look to certain counts for his fastball which – in my opinion – is priority one. Besides for the moderate predictability of his slider to splitter sequence (43.2 percent) there is not one pitch type sequence probability over 37 percent. This means hitters don’t typically know what Tanaka will throw next.

*For a breakdown of the approach used to predict pitches with Markov Chains, Danny Malter has a nice tutorial here.*

*Lead photo: Adam Hunger / USA Today Sports*