Written By: Forrest Allen

Why a New System?

In 1981, IBM introduced the first personal computer, MTV showed its first music video, and the Space Shuttle Columbia made its first flight around Earth.  Additionally, the NCAA Men’s Basketball Committee introduced the Ratings Percentage Index (RPI) to aid it in determining which teams should participate in the postseason tournament.  In 1999, NCAA baseball adopted the system, and over the last 24 seasons, the formula has been adjusted just once in 2013. Five years later, in 2018, the same group that introduced RPI to the world dropped it in favor of a “contemporary method of looking at teams analytically, using result-based and predictive metrics,” according to Dan Gavitt, senior vice president of basketball for the NCAA.

  

Figure 1: Timeline of the “Evolution” of RPI 

These facts, coupled with NCAA baseball selection committee chairman John Cohen’s comments on RPI after the 2023 selection show, prompted 6-4-3 Charts to see if there might be a better system for diamond sports. The effort culminated in the release of the Diamond Sports Ranking (DSR), a transparent and modern alternative that acknowledges the unique aspects of baseball and softball.

In contemplating a new ranking methodology, we wanted to address some specific criticisms of the current system:  

  1. Teams dropping in the rankings simply for taking the field against a bad team in RPI.
  2. The value of a win or loss in RPI changes based on what an opponent & the opponent’s opponents do weeks or months after the game is played.  
  3. Double-digit victories, stopped before a game’s normal length, are treated the same as a single-run victory in RPI.
  4. When comparing two teams, RPI doesn’t necessarily ensure the better team is ranked higher. 

Let us dive into each of these criticisms and discuss how DSR accounts for them.

1. Teams dropping in the rankings simply for taking the field against a bad team in RPI.

Because strength of schedule is so heavily weighted in RPI, oftentimes defeating a team – even by a wide margin – can result in that team falling in the rankings. While strength of schedule is certainly an important metric to use in evaluating team quality (DSR uses it), it has led to several unintended consequences. Perhaps the most egregious is the canceling of non-conference games later in the season to avoid the negative impact to strength of schedule. In addition, savvy teams have “scheduled,” rather than played, their way into the top 64. Because the outcome of the game is less important than the mere fact the game was played, teams with poor records can end up with very favorable RPI rankings. 

2. The value of a win or loss in RPI changes based on what an opponent & the opponent’s opponents do weeks or months after the game is played.  

Teams change over the course of the season, for better and worse. Some teams have players improve and hit their stride while others have injuries that can decimate a team. DSR accounts for these changes by valuing the outcome at the time the game is played rather than at a point later in the season. This modification allows games to be valued during the part of the season that they occurred. In addition, this feature allows users to track DSR points over time to see how they have risen and fallen over the course of the year (see Figure 6 for an example of this). 

3. Double-digit victories, stopped before a game’s normal length, are treated the same as a single-run victory in RPI. 

Excluding margin of victory may promote sportsmanship and prevent better teams from running up the score, but margin of victory is meaningful in determining how much better one team is than another, at least on that given day. DSR threads this needle by awarding more points to higher margins of victory, but treats any margin of victory greater than 10 as a 10-run victory. Said differently, all other things being equal, a nine-run victory holds greater value than a one-run victory, but a 17-run victory would award the same number of points for a team as a 10-run victory. 

4. When comparing two teams, RPI doesn’t necessarily ensure the better team is ranked higher.

Considering the purpose of a ranking system may initially seem simple, but upon further reflection it becomes an interesting philosophical question. Is a system trying to order the best teams by who has the best resume or by who would actually be favored to win a game on a neutral site? For the most part, ranking systems adopt the former approach. You hear phrases like, “They beat them in a head-to-head matchup so they have to be ranked higher,” or “This team has so many more losses than another team; they can’t be ranked ahead.” DSR challenges this line of thinking by suggesting the latter approach; create a ranking system where any team ranked ahead of another team would be favored to win on a neutral site.  

So what exactly is DSR?

DSR is a points-exchange system in which the winning team takes points from the losing team, hence why no team can be penalized for simply playing. It is modeled after a ranking system developed by Arpad Elo known as the Elo rating system, which was popularized by Nate Silver. Versions of this system are used extensively in a variety of sports, and a team’s points only drop when they lose a game and increase only after a win. The number of points involved in this exchange is a calculation based on four inputs:

  1. Win Quality
  2. Win Expectancy
  3. Margin of Victory
  4. Game Location

After determining the number of points changing hands, DSR generates a win probability versus an exactly average team. In so doing, the system ensures the rankings will be ordered best to worst in terms of which team would be expected to win the game. 

Figure 2: DSR Components, Softball 

Win Quality 

The most important factor in determining the number of points transferred from the losing team to the winning team is win quality. This is a quantitative measure of how good the defeated team is. Obviously, the better the defeated team, the more points will be awarded to the victor and taken from the loser. As mentioned earlier, this assessment is taken at the time the game is played and never adjusted based on anything that happens after this game. DSR calculates win quality based on two features: 

  • The defeated team’s Pythagorean winning percentage in conference play 
  • The cumulative Pythagorean winning percentage for all teams in their conference against all non-conference opponents. 

Before you stop reading on account of references to ancient Greek mathematicians, hang with me, it is a really cool concept involving no geometry whatsoever. According to MLB.com,

“Pythagorean winning percentage is a formula developed by renowned statistician Bill James. The concept strives to determine the number of games that a team *should* have won — based on its total number of runs scored versus its number of runs allowed — in an effort to better forecast that team’s future outlook.”   

In the section titled “Why it’s useful,” the site says,  

“Pythagorean winning percentage can help to identify teams that have either overachieved or underachieved. When looking at a club with a surprisingly poor or surprisingly strong record early in the season, using the theory to determine a team’s ‘expected’ winning percentage for the remainder of the year can paint a more accurate picture of how things will play out than merely looking at actual winning percentage.” 

In summary, DSR determines win quality by considering what a team’s winning percentage in conference should be and combines it with how we would expect the average team in the conference to perform against a non-conference opponent. The latter serves as a conference strength metric and is the same for every team in the conference. This metric is crucially important to contextualizing and valuing runs scored, runs allowed, wins and losses. 

Win Expectancy 

Win expectancy captures the probability of the winning team emerging victorious. Win expectancy and the points exchanged in DSR are inversely correlated; the higher the win expectancy, the fewer points exchanged and vice versa. This feature mitigates the impact of losing to heavy favorites. It also ensures that favorites do not take these games lightly. Because winning is paramount under DSR, this component presents coaches with an interesting dilemma; do they use these games to develop depth players and risk losing the game and high number of points? Or, do they put their best team out there to ensure victory. Like win quality, win expectancy is made up of multiple components: 

  • Strength of schedule: The winning percentage of a team’s opponents at the time of game 
  • Winning percentage: The percentage of games a team has won 
  • Pythagorean record: The percentage of games a team “should have” won based on its runs scored and allowed 
  • DSR point differential: The difference in DSR points entering the game 

For each of these inputs, the higher the values, the higher the win expectancy.  

The process is repeated for both teams. Some readers may recognize that the win expectancy for two teams playing each other must total to 1, and the methodology described above will almost certainly not produce this result. To achieve this, we again call on Bill James. His Log 5 formula takes as inputs winning percentages of the opposing teams and it produces the probability each team will win; the sum of which will always be 1.

The practical impact of all this is the fact that the higher the win expectancy for a team, the lower the number of points exchanged in their victory. Conversely, an underdog which springs an upset will take a larger number of points from the favorite, all other things being equal. The underlying philosophy here is that a heavy favorite should win the game; it is the outcome everyone expects. The incorporation of this feature rewards underdogs with increased point totals, thereby allowing teams upward mobility due to previous poor performance or low preseason ranking. Similarly, if a team has been ranked too high to start the season or maybe a bit “out over its skis,” losing as big favorites will bring it back closer to the pack.   

Margin of Victory 

The inclusion of margin of victory in DSR is arguably the largest departure from RPI as RPI does not consider this at all when calculating its rankings. Like the NCAA Evaluation Tool (NET), the system referenced earlier, DSR does consider margin of victory when distinguishing teams. Intuitively, we know that a win by a larger margin is almost always more impressive than a win by a smaller margin. 

In incorporating this feature, one must be careful not to create perverse incentives to justify or encourage teams running up the score. To address this, the margin of victory component is much less impactful on the number of points exchanged than win quality or win expectancy, as seen in Figure 2. While a margin of victory of nine runs is rewarded more than a margin of one run, DSR treats any margin of victory above 10 the same as a victory by 10. This cap is in place to throttle the incentive of continuing to run up the score in order to increase in DSR. In addition to being able to better distinguish between teams, inclusion of margin of victory may improve the on-field product by making every inning matter. Teams on either side of a lopsided score will have continued incentive to keep playing later in the game, whether it be to increase their lead or shrink the gap. 

Game Location 

Like RPI, DSR also considers the location of the game in determining the number of points exchanged. Because winning percentages are higher at home and lower on the road, wins on the road result in larger point values being exchanged than those at home, all other things being equal. Neutral-site games are worth the average of a home and road win.  

Win Probability Against an Average Team

After applying the point exchange, DSR calculates the win expectancy versus an exactly average team. This calculation uses the same formula described but leaves out the application of the Log 5 formula. The result of the Log 5 formula would be the same as the input if applied to an exactly average team. This transformation ensures any team ranked ahead of another team would be favored on a neutral field. 

Unique Features for each Sport 

While baseball and softball are often considered as a single sport, 6-4-3 Charts understands that each sport is unique and deserves a system tailored to its game rather than a one-size-fits-all approach. Because the games are similar, there is significant overlap, but there is one key difference between DSR for softball and baseball.

Considering the Starting Pitcher 

Arguably the biggest difference between the diamond sports is the day’s starting pitcher. In general, baseball follows a pattern in which a team’s best starter pitches the first game of a weekend series and won’t pitch again for another week. Consider the 2023 SEC Pitchers of the Year for baseball and softball, LSU’s Paul Skenes and Auburn’s Maddie Penta. In an 18-week, 71-game season, Skenes made 19 starts and threw 123 innings. On June 26, LSU faced Florida in a winner-take-all game for the national championship. Skenes had pitched two days earlier and never saw the field in his team’s most important game. Contrast this with Maddie Penta for Auburn. Penta started 37 times for the Tigers, a total representing 60 percent of their games. In the 19 games she didn’t start, she still managed to make six appearances in the circle. Overall, her effort resulted in 202.2 innings pitched. 

Figure 3: Appearances and Innings Pitched, Total and Proportionally 

Skenes and Penta weren’t outliers; LSU had 11 pitchers throw at least 20 innings last year compared to Auburn’s four pitchers. While convenient to compare elite SEC pitchers, the trend holds true across all teams and reveals a fundamental difference in the sports. Baseball requires a much larger pitching staff than softball. From a DSR perspective, the results mean the model should consider the day’s starting pitcher in baseball as the drop from 1 to 11 will be greater than the drop from 1 to 4 in softball. 

DSR tackles this problem in the baseball model by using a proxy for starting pitcher quality: day of the week. This choice may sound strange initially, but the nature of college baseball lends itself nicely to this idea. For most teams, the weekend series are the most important games they play all week. Of those weekend games, the first game is typically viewed as most important, giving rise to the moniker “Friday starter” for a team’s best pitcher. The idea is to save the bullpen for later in the weekend and give the team a chance to take the series by splitting the next two.  

Knowing this, we make the first game of a weekend series (typically Friday, and more recently Thursday) worth the most points and slightly decrease each game’s value as the weekend continues. Because a “Friday starter” starts on Fridays, a midweek opponent will rarely see the best version of a team, thus these games have the smallest values. The addition of this starting pitcher proxy makes the DSR component breakdown in Figure 2 look slightly different for baseball. 

Figure 4: DSR Components, Baseball

The So-What of it All 

While the methodology and approach are important, the rankings must strike a balance between challenging conventional assumptions about which teams people think are good based on data and conforming to what we see with our eyes. To do this, we start with how DSR would have ranked Division 1 Softball teams in 2023. In addition to the DSR, Figure 5 also shows the finishing RPI ranking and the difference in DSR and RPI.  

 

Figure 5: 2023 DSR and RPI Rankings 

However, before we get into how a season evolves over time, it’s worth a quick diversion to establish where a team starts its season and why. Before any games have been played, teams don’t have any of the components used in calculating the probability of defeating an average team.

After considering several options, we settled on a rolling 3-year total of runs scored and runs allowed in non-conference games to create a non-conference Pythagorean record. As you’ll recall, these values range from 0 to 1 and represent the percentage of games a team “should have won” based on the run differential. The range of these figures is huge, .902 to .062, reflecting the massive differences in talent across over 300 teams.

Starting teams this far apart could have the unintended consequence of burying teams who have struggled in the past, and conversely propping up previously successful teams so high, they’ll never fall out of contention for an at-large bid. We address this by bringing every team within 100 points of the average team; the team with the .902 Pythagorean record will start with 1600 points while the .062 team will start with 1400. The additional Pythagorean records will also be scaled to the 1400-1600 scale with teams with higher Pythagorean records closer to 1600 while lower records will come in closer to 1400.

To assess this approach, we examined the relationship between starting points and finishing points for over 600 seasons, a season defined as a team in a year, across baseball and softball, and found the following.

Figure 6: Starting Points vs Finishing Points D1 Baseball & Softball Teams 2023 Season

Of note is the adjusted R2 value of .74. The interpretation of this measure is 74% of the differences in the finishing points can be explained by the initial points. The p-value of < .001 is the likelihood these results happened from the result of random chance. For reference, .05 is the generally accepted threshold to be statistically significant. The strong correlation between starting points and ending points validates this approach.

To provide additional context for DSR, we can examine each team’s ranking over the course of a season and compare it to other teams via an intuitive visualization. Figure 6 shows two graphs-one for baseball and one for softball. Displayed are four of the teams with significant differences between DSR and RPI shown on each graph for the sports respectively.

Figure 7: DSR Rank Value by Date, Division 1 Baseball 2023

 

Figure 8: DSR Rank Value by Date, Division 1 Softball 2023

Figures 7 and 8 show how the DSR rank value changed over the course of the 2023 season for each team. When a team wins a game, there is an upward increase in their DSR points, and the magnitude of that increase depends on the components described previously in this blog. This change in points impacts the team’s DSR rank, which we evaluate using DSR rank value. Conversely, when a team loses a game, there is a downward drop in their DSR points. Plotted on Figures 6 and 7 are the DSR rank values by date for the teams of interest. By looking at each team’s line across the season, you can visualize the type of season that they had in comparison to the other Division 1 Baseball or Softball teams. For example, a line that continually steps upward throughout the season reflects a team who consistently won games throughout the season, and therefore increased their DSR points and DSR rank value over time.

Perhaps the clearest takeaway from the graph is that DSR prioritizes winning irrespective of sport. In general, the teams DSR ranks better than RPI, such as Marshall Softball and Oral Roberts (ORU) Baseball, have very high winning percentages,  .815 and .788. These high winning percentages rank fifth and second among the DSR’s top 30 teams for each sport respectively. On the other hand, Kentucky Softball and Coastal Carolina Baseball are higher in RPI than DSR. Again, the winning percentage is largely responsible; Kentucky Softball has a 38th ranked winning percentage among teams in the top 40 for DSR. Coastal Carolina Baseball sits at 33rd in winning percentage among DSR’s top 35 teams. That said, winning isn’t the only thing; Central Arkansas Softball finished with a  .789 winning percentage which ranked seventh among the DSR top 25, yet finished 23rd in the DSR. On the baseball side, TCU finished in the top 10 in DSR despite a winning percentage that ranked 26th out of DSR’s top 30 teams.

The other interesting feature is how these teams achieved their rankings; mid-majors achieve success through quantity of wins while Power 5 schools get there using quality wins. Of the 22 teams in DSR’s top 11 for each sport, two are considered mid-majors; Indiana State Baseball and Louisiana Softball. Of the 128 games these two teams combined to play, only nine resulted in a double-digit point exchange; that’s 7 percent. Their rankings at the end of the season are a result of their consistent ability throughout the season to gather a quantity of wins. When you compare these teams to a pair of Power 5 teams in the top 11, TCU Baseball and Oklahoma State Softball, 21 percent of the games they played resulted in double-digit point exchanges. That’s three times the percentage of mid-majors. These examples demonstrate how there are various paths to take in achieving a high ranking within DSR.

Conclusion 

The Diamond Sports Ranking is intended to modernize the way we evaluate college baseball and softball teams. While no model or computer system can ever replace decades of expertise and knowledge of subject matter experts, our hope is that DSR is a tool for those charged with making these critical decisions. In an effort to make these rankings as transparent and accessible as possible, we are thrilled to announce that we will be releasing the full Diamond Sports Rankings throughout the 2024 season in collaboration with D1Baseball & D1Softball. The rankings will be featured exclusively on d1baseball.com & d1softball.com for fans to explore. Additionally, fans will be able to interact with line graphs like those shown above in Figure 6 to see how teams rise and fall in the DSR throughout the season. 6-4-3 Charts and D1Baseball & D1Softball are pleased to offer the diamond sports community with a new, transparent, and accessible tool to analyze teams throughout and after the season. 

Still have questions? Contact info@643charts.com