Labels

Winning in the NBA - TLDR Version

Why this post

The biggest NBA transaction in the summer of 2014 was Kevin Love's trade to Cleveland. After it was announced, I tried to measure Love’s value by comparing him to other players at his position as well as analyzing his effect on Minnesota’s team-wide statistics.

My research returned mixed results: Love is undoubtedly one of the best power forwards to ever play the game, as measured by box score stats. But his ability to affect Minnesota’s team numbers was muted. In fact, I found that of the common box score stats, only half really vary from team to team: 3 point and free throw attempts, offensive rebounds, steals, and blocks. Love’s ability to gather defensive rebounds almost didn’t matter as the best rebounding teams are only a bit better than league average. The data brought up more questions than answers, pointing to the complex, team-oriented nature of basketball.

One of the most important questions left unanswered was: how do traditional box score stats (or even tertiary numbers using box score stats, such as True Shooting %) correlate to winning? For this post, I ran some correlation and regression analysis using Basketball-Reference’s stat database to see which recorded metrics associate most closely to the following team strength indicators: Wins, Margin of Victory (MOV), and Playoff Wins.

Simple Correlation

I first ran a simple correlation test. I admit this isn’t super original – I assume Hollinger already did this when creating his proprietary metrics and the analytics-minded teams do this to death. But I wanted to understand the process behind the analysis. I started with correlating the gamut of stats to Wins and Margin of Victory (including only categories that had a >10% Standard Deviation impact):

All the data for this piece are gathered from the treasure trove at Basketball-Reference.com. I am very indebted to the owners/opereators of that site who have spent time, energy, and money to develop an easily searchable, one-stop reference point for almost any non-proprietary stat you can think of. I could not have made this post without them, and you really should go there to develop your own empircal understanding of the game.

Note that I took the absolute value of the correlations and sorted for the highest standard deviation impact. So the list of strong correlations includes things that move with the Y categories (like ORtg and Wins) as well as items that move in the opposite direction (DRtg).

First, the two sets are fairly close. Analysts believe MOV as a better predictor of value, but the components that correlate with MOV are similar to those that relate to wins. Some of the stuff is in a different order, but it’s all generally there.

It’s also apparent that the categories correlated best were synthetic stats built to measure team quality; of course figures SRS, PW/PL (Pythagorean Wins/Losses, built from MOV), and ORtg/DRtg will correlate to wins because they are designed to. ORtg ranks slightly higher in both charts than DRtg, which is interesting.

The next part of the analysis was to remove synthetic stats and focus on their components: things like shooting percentage, rebounding, etc. that are the building blocks of ORtg and DRtg. This is how the correlation tied out. I also added a correlation set focusing on Playoff Wins (adjusting pro forma for earlier seasons where first-round series lasted only 5 games). Finally, I filtered out numbers that describe the same thing such as Points vs. Points/Game. This is how it turns out (includes only top quartile correlations):


The first thing I noticed was the difference between the Wins/MOV sets and the Playofff Wins set. Wins and MOV focus equally on offense and defense, with Opponent FG%, your own TS%, and similar stats interspersed. But the top of the Playoff Wins set almost exclusively focuses on defense numbers: preventing FG%, assists, and the like. I’ve highlighted in green the categories related to a team’s offense and red those associated with defense, to make this distinction clear. This is very rudimentary analysis, but this indicates the old adage that defense wins playoff games is correct. Also, it’s interesting that things like preventing turnovers and scoring a lot of points/game don’t show up as important in the playoffs, where anecdotally, the pace slows down. This all jives with ORtg taking precedence in the previous charts.

Other things that contribute to winning:

  1. Opp FG% ranked higher than the other Opponent shooting stats, even the advanced ones. This seems to indicate that protecting the rim is more important than avoiding fouls (as FTs are captured in Opp eFG%) or guarding the perimeter (captured in Opp TS%). 
  2. I think it’s interesting that Opp 2P% ranks higher 3P%. You’ll see this theme repeated throughout this post, but it seems like successful teams use 3s to space the floor for rim runs rather than as an end in themselves. In fact, Opponent 3P rate does not show up at all, indicating that efficiency, not volume, is key.
  3. Your own TS% is more important than other shooting metrics. So 3s and FTs are definitely important to your offense. 
  4. 2 point shooting (2P%) is more important than (3P%) – teams that get to the rack prosper. 
  5. Limiting opponent assists is key. It is telling that in the Playoff Wins set includes limiting assists on 3s (the most efficient type of 3 is assisted) but not 2s.
  6. Opponent TS% was not available. However, since Opp 3P% ranks higher than Opp FTr (free throw rate) or Opp FT%, it seems that after protecting the rim, preventing 3s is more important than avoiding fouls.
  7. Defensive rebounds are important, but not offensive ones. The Spurs are on to something.
  8. Opponent dunks are bad.
What Doesn't Work

We’ve identified which things correlate with winning. But which ones don’t? Off the top of my head, things like assists, turnovers, 3s and FT attempts, corner 3s, and other seemingly important things do not show up at the top of the correlation list. So what’s at the bottom? What doesn't really matter?


This is quite a list. These are things that don’t correlate positively or negatively to winning:
  1. It doesn’t seem to matter where you shoot. At all. You just have to shoot well. 
  2. Likewise it doesn’t matter if your opponent is taking 2s or 3s as long as they aren’t making them.
  3. Steals and generating turnovers don’t matter much. Not intuitive for a category with such wide standard deviations (http://youmakethecalls.blogspot.com/2014/11/kevin-love-part-3-does-love-make.html).
  4. Shockingly, generated free throws and making a lot of them… don’t matter?
  5. Pace doesn’t help or hurt.
Multi Variable Analysis

After going through the simple correlation, I went a step further and used Excel for some basic multi-variable regression. Although correlation explains how wins fluctuate with individual stats, many of these stats correlate with one another – for example, the TS% problem, which is am amalgamation of 2P%, 3P% and FT% as well as %3P (% of 3 pointers taken), %Corner 3PA, FTr, and the like. Multi-variable regression takes all of these into account and attempts to determine which individual component has the greatest effect (to be entirely correct, which has the least likelihood of not having an effect). 

I ran the multi-variable tests against several Y values: MOV, TS%, and Opp FG%. I won't include the full analysis in this TLDR version, but just know that it confirms a lot of the data from above: you need to shoot a good % from pretty much everywhere, draw fouls, and protect the rim. Pace doesn't really help, and neither do offensive rebounds. 

Conclusion

This is pretty messy analysis and does not come near to being complete. Having the SportVu data helps immensely, especially on the player level. And we did confirm some things as leading to wins: Shooting good %s, getting to the rim, drawing fouls, and avoiding turnovers, while preventing those things on defense. Curiously, offensive rebounding doesn’t matter and neither does shot location – as long as you are converting good %s. And the analysis confirmed some old-school thinking, too: in the playoffs, pace goes out the window, defensive becomes more important and attacking/finishing at the rim is paramount to offensive success.

As far as applying this analysis to Kevin Love – his shooting ability near the rim and from 3 certainly help, but since defensive FG% is more important to playoff teams, he may be a slight net negative in that regard. But even in that analysis this data is woefully incomplete – for example, we don’t know how much one great shooter can juice an offense, or how much a subpar defender hurts the defense. We can infer that his gaudy rebounding figures (offensive and defensive) don’t really move the needle. The assists are nice, but that’s kind of what the data show Love to be – a nice player who can be a key starter on a competitive team, but isn’t “the guy.” Of course, old school evaluators who looked at his Minnesota teams’ records and labeled him as such didn’t need pages and pages of spreadsheets to know that. You make the call.

Email me if you’d like a copy of my Excel spreadsheet. Otherwise, go to Baketball-Reference to build your own!

#YMTCSports #YouMakeTheCalls

No comments:

Post a Comment