Written By: James Kim

Introduction

Over the past few years, batted ball metrics like launch angle and exit velocity have flooded professional baseball leagues with the advent of StatCast, Hawkeye, TrackMan, and many other ball-tracking technologies. Colleges have been catching up to these analytical advances, and for those that have made investments, teams have been able to better understand their players’ pitching and hitting profiles. This raises a question: how can we transfer current knowledge in the major leagues to this rapidly growing set of data?

From a hitting standpoint, existing research in professional baseball (including findings from Fangraphs writer Ben Clemens) has considered how exit velocities correlate with overall offensive performance. For instance, he has concluded that players’ 95th percentile of exit velocity is the “stickiest” metric year-to-year, thus being one of the most reliable statistics for hitters, and he has also predicted potential breakout candidates who have large differences between their average and upper quartile exit velocities. However, this type of research has not been conducted at the college level. Given these findings with Major League data, I was curious to see whether the same conclusions can be drawn from a playing field with typically higher offensive outputs (in terms of exit velocities and other offensive metrics, as a result of metal bats, higher variation in pitching talent in the division, ballpark measurements etc.). In general, I became curious as to what batted ball metrics strongly correlate with overall offensive performance at the Division 1 level.

Data

Using TrackMan batted-ball data, I was interested in seeing how different distributions of launch angles and exit velocities correlate with offensive performance, which I decided to measure with wOBA (weighted on-base average). The batted-ball data used to calculate these metrics are extensive, including a wide range of samples collected with TrackMan technology from the 2019 to 2022 seasons. For the purpose of this research, the main variables I used from this original dataset were player and team names, launch angle, exit velocity, and play outcome. It is important to note that the sample sizes of balls hit in play vary quite drastically between teams and players, as only certain programs have adequate access to the technology at games. 

First, for each batted ball in the sample, I defined a wOBA run value for each type of ball-in-play outcome: out, single, double, triple, and home run. These linear weight values were provided by 643 and basically give a concrete run value for each type of hit based on the run-scoring environment for the Division 1 level in 2022. From the standpoint of analyzing individual hit balls, this allowed me to look at quantitative distributions for different batted-ball attributes.

Then, for each player, I initially calculated batted ball rates using the launch angle and velocity variables in the dataset, similar to metrics that can be found on MLB hitters’ profiles on Baseball Savant. First I defined five exit velocity distribution variables: average, median, maximum, 75th, and 95th percentile exit velocities. As found in one of Clemens’ findings, there is a distinct difference between average velocities (or “game performance” EVs), and upper-end exit velocities (or raw power). The presence of the 75th percentile, based on one of my initial hypotheses, is to reference players who have a combination of both: they consistently hit the ball hard, and they also output a high maximum velocity. I also included a “hard hit” variable that measures the rate that a player hits a ball 95 mph or harder. Rather than considering one’s entire distribution of hit balls, this rate would only consider balls that were hit at a velocity above a certain threshold.

While hitting the ball hard is important in earning hits, it is also important to make contact with the ball at a suitable launch angle to lift the ball. As a result, I included other metrics that take launch angles into consideration. First, barrel rate, as defined by MLB.com’s glossary, measures the rate of balls that are hit with an exit velocity of 98 mph and a launch angle between 26° and 30°, with the range of degrees expanding for every increase in velocity (for instance a ball hit 1 mph faster at 99 mph ball and between 25° and 31° is a “barrel”). Secondly, I calculated sweet-spot rate, which captures the rate that balls are hit between 8° and 32°.

Methodology

Measuring offensive output involved calculations of wOBA (weighted on-base average), which as a whole, is arguably the best “all-encompassing” batting metric that measures performance. It appropriately weighs every outcome in a player’s hitting stats according to its run value, giving each hitter an appropriate average based on the run environment. In addition, in order to better draw relationships with the batted ball data, I used wOBAcon, which only takes a look at wOBA for balls that are hit in play. 

In order to measure relationships between these hitting rates and overall offensive performance, I ran linear regressions to measure correlations and created scatterplots and line graphs to visualize such relationships. Finally, in order to consider the nonlinear relationship between certain metrics, as considered later in this piece, I also used the general additive model (which can be shortened to GAM), which is commonly used to explain relationships that are more complex than linear regressions yet produces results that are relatively easy to interpret.

Analysis: Regression Analysis for Batting Metrics

First, after running regressions for each player’s total wOBA with hitting metrics from the 2022 season, I saw relatively low levels of correlation. 

Hitting Metric Adjusted R2  
Average EV .26
Median EV .23
Max EV .11
75th Percentile EV .25
95th Percentile EV .22
Hard Hit Rate .24
Barrel Rate .23
Sweet Spot Rate .05

Figure 1: Correlations between hitting metrics and weighted on-base average (wOBA)

All adjusted-R2’s were in the .20s, except for sweet spot rate at .05 and Max EV at .11. This indicates that launch angle and pure raw power alone are not strong predictors of offensive performance, and as one may expect, a combination of both exit velocity and launch angle would best measure success. In addition, although the differences are very small, my launch angle hypothesis may have some glimmer of hope: the 75th percentile exit velocity (R2 = .25) fared very slightly better than median EV (R2 = .23) and 95th percentile EV (R2 = .22), but is barely lower than Average EV’s .26. Again, the differences between these metrics are very miniscule, but I began to wonder whether these hint at any significant findings.

In order to better consider the relationship between these metrics given the data, I then considered making the dependent variable wOBAcon (weighted on-base average on contact), as opposed to wOBA. While wOBA does an excellent job measuring overall offensive production, it considers at-bats involving non batted-balls, including strikeouts, walks, and hit-by-pitches. Therefore, its correlations with batted-ball metrics may not be the strongest indicator of quality of contact: wOBAcon fixes this issue by only taking into account balls that were hit in play. Here are the same regressions, but for weighted on-base average on contact.

Hitting Metric Adjusted R2 
Average EV .40
Median EV .37
Max EV .18
75th Percentile EV .42
95th Percentile EV .40
Hard Hit Rate .42
Barrel Rate .42
Sweet Spot Rate .07

Figure 2: Correlations between hitting metrics and weighted on-base average on contact (wOBAcon)

While the coefficient of determination increased across the board, 75th Percentile EV, Hard-Hit Rate, and Barrel Rate continue to have the highest levels of correlation. Thus, I concluded that the consistency of hard-hit balls is the strongest indicator of offensive performance, as these numbers not only measure how hard one can hit, but also considers how often those hard hit balls occur. However, this raised two questions: is there another point in the distribution between the 50th and 95th percentile (other than the theoretically-chosen 75th percentile) that may better predict offensive output? And how much does the launch angle distribution, as considered in barrel rate, really matter?

Exit Velocity Analysis

Starting off with exit velocity, I think it’s pretty clear: the harder the better. For optimal success, it is important for hitters to hit hard (whether that means 90, 95, or 100 mph+ off the bat) and for that batter to do so often. That’s why I chose the 75th EV percentile as an initial metric: standing in the center of the upper half of a player’s data, this upper quartile captures both consistency and output size.

Figure 3: Adjusted R2 values between different percentile exit velocities and wOBAcon

While considering every percentile between the 50th and 95th, I again ran linear regressions to find the largest correlations with wOBAcon. While the differences are very small, the absolute largest R-squared value was the 85th-Percentile EV with .425, very close to the 75th-Percentile EV’s .422. Given this point in the distribution, with this “middle-ground spot” between the median and 95th percentile leaning towards the latter, I concluded that the upper range of a hitter’s exit velocity profile matters the most when capturing correlations between exit velocity and wOBAcon. Simply put, for this particular dataset, the 85th percentile of exit velocity for hitters best explained the variation in wOBAcon. While considering exit velocity alone shows a moderate correlation, I then wondered how we should consider launch angle to maximize its correlation with wOBAcon. 

Launch Angle Analysis

First, I wanted to take a look at the distribution of batted balls, visualizing their launch angle and velocities with respect to the outcome of the play result. 

Figure 4: Launch Angles and Exit Velocities for tracked batted balls from 2019 to 2022

Taking a look at the scatterplot, one can see that the sliver of blue points, marking singles, starts to build up slightly past the 0° mark, and the extra base hits start to accumulate between the ~10° – 30° marks of the plot. Where these two categories also differ is their relationships with launch angle: As exit velocity increases, one can see that the red and gold marks (indicating doubles and home runs) start to fan out rightward, indicating that the launch angle window for extra base hits widens. The contrary goes for singles, marked in blue: as exit velocity decreases, the angle spreads rightward, completing the bottom of the “C” shape that these base-hit balls have created in the scatterplot. This means that there should not be a set range of launch angle that explains success, as measured by the low R2 of sweet-spot rate in the previous section. 

Next, I wanted to further examine the makeup of these batted balls by organizing the exit velocities into bins of 5 degrees:

Figure 5: Play result proportions for launch angle bins

In addition to similar conclusions as before, we can see that the proportion of outs is the lowest in the 10° to 15° range, and the greatest proportion of extra base hits is between 15° and 35°. This raises another question: how do you exactly demarcate where a launch angle is considered a success? Launch angle alone does not matter; exit velocity contributes tremendously to how well a ball is hit, so the “success” of a launch angle during an at-bat should be dependent on the velocity that it was hit. That’s why barrel-rate exits: it considers an expanding range of degrees as the EV increases. I wanted to see if I could dig deeper into production for different angles. Below is a graph showing the wOBA distribution for balls hit at different angles, also marking the “sweet-spot” region of 8° to 32° with the two red lines.

Figure 6: wOBAcon distribution for different launch angles

This range definitely captures the highest values of wOBA, and it is interesting to see the two coinciding peaks within the space. However, this doesn’t say much unless we know how hard different balls were hit, so I grouped the data into three buckets: soft hit balls (60 – 75 mph), medium hit balls (75 – 95 mph), and hard-hit balls (95+ mph)

Figure 7: wOBAcon distribution for launch angles, grouped by different exit velocity groups

One particular observation caught my attention: all three exit velocity groups seem to intersect at around the 18° point, meaning that at this angle, all three groups should expect to produce the same wOBA (~.650). Perhaps this marks a solid basis as a general point of comparison, as points with higher wOBAs from this point indicate upward trajectory towards each group’s respective peak. Using this as a reference point and recognizing each group’s peak region, I denoted the approximate optimal zone for each grouping of balls.:

Group 1 (Soft-hit balls, 60 – 75 mph): 18° – 28°

Group 2 (Medium-hit balls, 75 – 95 mph): 10° – 18° 

Group 3 (Hard-hit balls, 95+ mph): 8° – 32° (sweet-spot region)

Intuitively, the grouping makes sense: soft-hit balls need relatively higher angles to creep over infielders’ heads and fall in shallow outfield. Balls hit with medium exit velocities need to be hit for line drives, or else they’ll end up as routine groundouts and flyouts. Hard-hit balls can fall as hits in a wide range of ways, given that they are simply hit hard.

Given these groupings, I defined a new variable, ‘optimal angle rate’, that determines the rate of balls hit in play that fall in one of these three buckets. Just as I did in the previous section, I then ran a linear regression for each player, making wOBAcon the dependent variable, and found the R-squared to be .312. Although this is still a moderately low level of correlation, it is a much better correlation than sweet spot rate’s R2 of .07. In addition, as opposed to barrel rate, which only considers balls hit 95 mph+, this rate rewards hitters who may not necessarily hit the ball hard very often but are still able to rack up many hits in the lower exit velocity range. Although this rate is not perfect, it captures specific ranges of velocities that optimize the launch angles that balls are hit. 

Putting it together: General Additive Model

Finally to put these metric findings together, I decided to put the 85th-Exit Velocity-Percentile and Optimal Launch Angle rate in a generalized additive model (abbreviated as GAM) to capture its nonlinear relationship with wOBAcon. 

(GAM formula, where ‘s’ stands for a smooth, or spline function)

From a technical standpoint, this model is flexible, meaning that it is able to use splines, which are complex functions, to capture non-linear features of each of these rates. Summing such splines in this model is what allows it to be so flexible, and in the case of our data, allows the batted ball metrics to capture wOBAcon without the limitations of a linear regression and the complexities of machine learning models. Looking at the summary of this model, the adjusted r-squared was equal to .49, still a moderate relationship, but a higher correlation than all the previous linear regressions. This model also fared slightly better than other GAMs I tested involving a different combination of exit velocity and launch angle variables, including hard-hit, sweet-spot, and barrel rate. As seen below, according to this model, maximizing both of these rates leads to a high wOBAcon value as indicated by the most blue region in the top-right corner.  

Figure 8: Predicting wOBAcon with 85th percentile EV and Optimal Angle Rate using GAM 

As a whole, seeing the relatively strong level of correlation gave some optimism as to how different metrics and models can better explain offensive performance in this type of run-scoring environment. 

Conclusion

Overall, when analyzing college batted ball data, one can draw similar general conclusions to professional batted-ball distributions regarding launch angles and exit velocities. In terms of the latter, looking at a percentile value between the median and upper-quartile point can provide value. Raw power (95th percentile EV) and rates measuring the central tendency (whether average or median) all obviously matter, but choosing some point in the middle can perhaps effectively capture the strengths from both measurements. In addition, when looking at launch angles, considering different velocities, as opposed to looking at one broad region, and analyzing hits with lower velocities are things that people can consider when valuing players’ offensive production. Further research should consider more year-to-year findings for these rates and test whether there are other distributions or metrics to consider considering the run-scoring environment of college baseball.