Walker Gentz applies his pitch result prediction model to analyze the role that velocity and the count state play in a hitter’s decision to swing or not.

In our previous post, we created three machine learning models with three different algorithms (random Forest, Bayes, and SVM) to predict pitch results given 13 different attributes related to the pitch. In this post, we found out a few things that were important:

  1. Pitch location was the most important predictor of pitch result
  2. Using the three models combined as an ensemble (voting) method, we were able to get an accuracy of ~75% (with the example of Austin Love’s pitch data, we got ~80% accuracy)
  3. We were able to model Austin Love’s xO-swings and xZ-swings during the 2021 season and compare it to his actual O-swings and Z-swings

In this latest piece, we’ll use the same model to analyze the role velocity has in swing decisions in D1 baseball. To do this experiment, we’ll be using the same 41,000 pitches we used to train and test the original model before changing one particular attribute to look at the change in the xResult. The variables we’re changing are the velocity of a fastball and the count, comparing the xResult for a fastball pitched at and greater than the median fastball velocity of the 41,000 pitches we have. The goal of this blog will be to compare the difference between the four xResults and to see how an increase/decrease of velocity and the change in count directly affects the expected result of the pitch while other variables are held constant. The four different data frames will be:

  • 90 mph / 0-0
  • 95 mph / 0-0
  • 90 mph / 2-2
  • 95 mph / 2-2

Setting Up the Experiment

To do this, we create four different data frames in R with the necessary variables for the model to be able to predict xResult. In each data frame, we use the median of each of the eight numeric variables when a fastball was thrown that the model needs:

  • Spin Rate
  • Induced Vertical Break
  • Horizontal Break (absolute value)
  • Vertical Approach Angle
  • Release Height
  • Release Side (absolute value)
  • Vertical Release Angle
  • Extension

We use the absolute value of horizontal break and release side to take into account the difference in both variables for left-handed and right-handed pitchers. If we didn’t do this, the median would be closer to 0, which would skew results. With the TaggedPitchType variable from the 643’s Trackman database, we kept those constant and did not differentiate between different fastball types. The only four variables that change are:

  • Plate Location Side
  • Plate Location Height
  • Release Speed
  • Count

To do this experiment, we’re not sub-setting the sample to the individual pitches that meet these specified parameters, rather we created four independent data frames and changed the release speed and count to show the predicted xResult at a certain pitch location.


Let’s compare the 0-0 counts against each other…

Looking at both plots in the start of an at-bat, it is interesting to see the large difference between the two xResults because of the difference in the velocity. With a velocity of ~90 mph (median of the dataset used to create the model) we can see that almost everything in the strike zone is an expected swing with the exception the bottom part of the zone. We also see a lot of expected strikes called about a foot off the zone in both directions. This is interesting to note because it happens with the 95-mph fastball as well. In 0-0 situations, the strike zone is widened when a fastball is thrown. Also, while comparing the two different xResults you can see the xSwing zone move up ~6 inches in the 95-mph fastball’s xResults. The faster pitch results in a higher xSwing% up and out of the zone, while also leaving a lot more space for a xStrikeCalled down in the zone.

Now, let’s look at the xResults in a 2-2 count…


Looking at both of visuals together, the differences are very minimal. The model projects both pitches would get O-swings outside on a right-handed hitter, while also getting O-swings up in the zone. This should be expected because of the batter’s mentality at the plate with two-strikes. They want to protect the plate and themselves from a backwards-K, so they’re more aggressive with their approaches at the plate. While comparing the two xResults for the 2-2 counts shows little difference, comparing these two with the xResults of the 0-0 count shows how big of a role the count state plays on swing decisions.


Comparing the difference in counts between the median fastball velocity, we can see that the bottom of the xStrikeCalled area disappears in the 2-2 count versus the 0-0 count. Early in at-bats, the model predicts that batters will give up a low fastball in the strike zone to look at/attack other pitches that would be better to drive the ball. Compare this to a two-strike approach at the plate in the other two graphs, we can see that hitters are more protective of the bottom of the zone now, with the xSwing zone taking up the whole strike zone in the two-strike count. It’s also interesting to see the rise in the xSwing zone in the 0-0 counts between the two velocities. The model predicts that fastballs with more velocity up in the zone (as much as ~6 inches off the top of the zone) will get more swings at the start of an at-bat.

It’s also interesting to look at the small differences in the xSwing zones of the 2-2 counts. Again, we know that batters are going to be more aggressive here to protect the plate to defend against a looking strikeout, but you could also draw another conclusion from this. The conclusion here could be that in 2-strike counts, velocity’s effect on the batter’s decision making is minimal. In these counts, the batter is just looking to make contact and avoid being struck out regardless of how fast the pitch is coming at them. However, knowing the effect of velocity is minimal in 2-strike swing decisions, we can still expect that the swings against average velocity would have better batted-ball outcomes compared to the swings against an above average fastball, and thus is why above average velocity is so performant.

In our next few posts, we’ll be looking at two more uses for the model: In one, we’ll be comparing the expected stats versus the actual stats of pitchers from the 2021 season. In the other, we’ll be comparing the xResults of two different movement profiles for curveballs to see which the model predicts would be better to use: 12-6 curveball versus the slurve.