Monday, March 16, 2015

Winning in the NBA

WARNING: This post is extremely long and deals with a very complicated subject - what it takes to win in the NBA. A TLDR version can be viewed here. For those that read on - I warned you!

Winning in the NBA

What a season for the Cleveland Cavaliers. Early season excitement gave way to frustration as the team fell below .500 multiple times. When LeBron went to Miami for 2 weeks to fix his ailing back and ankles (the first such break since a wrist injury sidelined him at the end of his first Cleveland stint), it seemed to portend a lost season (and possibly more). Then LeBron came back, dropped 33 on the Suns (though in a loss, the 6th straight at the time), and generally looked like LeBron again. The Cavs acquired some defense (Timofey Mozgov), shooting (J.R. Smith) and depth (Iman Shumpert), Kyrie took a leap, and the team went on a tear. 

Not lost during in the cycle has been the play of Kevin Love, who admitted to having to make big adjustments. The season has been a struggle with him posting 5 year lows in major statistical categories, missing the All Star game, being referenced in a bizarre tweet from LeBron, and getting benched in crunch time. There have even been some ridiculous rumors that Love may not re-sign in Cleveland this summer (ridiculous because it is not the right time to speculate about that, unless you’re Goran Dragic).

Why this post

The trade the Cavs made for Love last summer turned out to be immensely polarizing. Before the season I tried to determine Love’s value by comparing him to other players at his position as well as analyzing his effect on Minnesota’s team-wide statistics.

My research returned mixed results: Love is undoubtedly oneof the best power forwards to ever play the game, as measured by box score stats. But his ability to affect Minnesota’s team numbers was muted. In fact, I found that of the common box score stats, only half really vary from team to team: 3 point and free throw attempts, offensive rebounds, steals, and blocks. Love’s ability to gather defensive rebounds almost didn’t matter as the best rebounding teams are only a bit better than league average. The data brought up more questions than answers, pointing to the complex, team-oriented nature of basketball.

One of the most important questions left unanswered was: how do traditional box score stats (or even tertiary numbers using box score stats, such as True Shooting %) correlate to winning? For this post, I ran some correlation and regression analysis using Basketball-Reference’s stat database to see which recorded metrics associate most closely to the following team strength indicators: Wins, Margin of Victory (MOV), and Playoff Wins.

Simple Correlation

I first ran a simple correlation test. I admit this isn’t super original – I assume Hollinger already did this when creating his proprietary metrics and the analytics-minded teams do this to death. But I wanted to understand the process behind the analysis. I started with correlating the gamut of stats to Wins and Margin of Victory (including only categories that had a >10% Standard Deviation impact):

All the data for this piece are gathered from the treasure trove at I am very indebted to the owners/opereators of that site who have spent time, energy, and money to develop an easily searchable, one-stop reference point for almost any non-proprietary stat you can think of. I could not have made this post without them, and you really should go there to develop your own empircal understanding of the game.

Note that I took the absolute value of the correlations and sorted for the highest standard deviation impact. So the list of strong correlations includes things that move with the Y categories (like ORtg and Wins) as well as items that move in the opposite direction (DRtg).

First, the two sets are fairly close. Analysts believe MOV as a better predictor of value, but the components that correlate with MOV are similar to those that relate to wins. Some of the stuff is in a different order, but it’s all generally there.

It’s also apparent that the categories correlated best were synthetic stats built to measure team quality; of course figures SRS and PW/PL (Pythagorean Wins/Losses, built from MOV), and ORtg/DRtg will correlate to wins because they are designed to. ORtg ranks slightly higher in both charts than DRtg, which is interesting especially when you look at the scatter plots of how these two correlate to wins:

I drew little clusters around teams with 55 wins and above – a somewhat arbitrary figure, but one that the data show relates to playoff contention. It seems like 55+ win teams boast strong strong offenses with more varied defensive efficiencies. I've read that defense is the easiest thing for poor teams to improve, but perhaps offense separates the cream from the crop.

The next part of the analysis was to remove synthetic stats and focus on their components: things like shooting percentage, rebounding, etc. that are the building blocks of ORtg and DRtg. This is how the correlation tied out. I also added two correlation sets focusing on playoff performance based on playoff wins (adjusting for earlier seasons where first-round series lasted only 5 games). Finally, I filtered out numbers that describe the same thing such as Points vs. Points/Game. This is how it turns out (includes only top quartile correlations):

The first thing I noticed was the difference between the Wins/MOV sets and the Playofff Wins set. Wins and MOV focus equally on offense and defense, with Opponent FG%, your own TS%, and similar stats interspersed. But the top of the Playoff Wins set almost exclusively focuses on defense numbers: preventing FG%, assists, and the like. I’ve highlighted in green the categories related to a team’s offense and red those associated with defense, to make this distinction clear. This is very rudimentary analysis, but this indicates the old adage that defense wins playoff games is correct. Also, it’s interesting that things like preventing turnovers and scoring a lot of points/game don’t show up as important in the playoffs, where anecdotally, the pace slows down. This all jives with ORtg taking precedence in the previous charts.

Other things that contribute to winning:

  1. Opp FG% ranked higher than the other Opponent shooting stats, even the advanced ones. This seems to indicate that protecting the rim is more important than avoiding fouls (as FTs are captured in Opp eFG%) or guarding the perimeter (captured in Opp TS%). 
  2. I think it’s interesting that Opp 2P% ranks higher 3P%. You’ll see this theme repeated throughout this post, but it seems like successful teams use 3s to space the floor for rim runs rather than as an end in themselves. In fact, Opponent 3P rate does not show up at all, indicating that efficiency, not volume, is key.
  3. Your own TS% is more important than other shooting metrics. So 3s and FTs are definitely important to your offense. 
  4. 2 point shooting (2P%) is more important than (3P%) – teams that get to the rack prosper. 
  5. Limiting opponent assists is key. It is telling that in the Playoff Wins set includes limiting assists on 3s (the most efficient type of 3 is assisted) but not 2s.
  6. Opponent TS% was not available. However, since Opp 3P% ranks higher than Opp FTr (free throw rate) or Opp FT%, it seems that after protecting the rim, preventing 3s is more important than avoiding fouls.
  7. Defensive rebounds are important, but not offensive ones. The Spurs are on to something.
  8. Opponent dunks are bad.
I was curious what would happen if we plotted a team’s TS% to Wins against Opp FG% to Wins (similar to the ORtg v. DRtg scatters):

The differences I’m talking about are very slight. I do think it’s interesting that, on the upper end of both charts, there are more teams under the trendline for Opp FG% than for TS%. The data seem to imply that elite defense doesn't translate to regular season success as much as offense. I admit that the sample set gets very dicey at this extreme. Here is how the two scatters change when compared to MOV, a more predictive stat than Wins (I tried Playoff Wins but there is too much noise and too few good data points from playoff teams):

The results tie out. There are a lot of fun scatters you can produce with this data. And again, the Playoff Wins set shows some opposing data. Let’s move on before we go too far down the rabbit hole. We’ve identified which things correlate with winning. But which ones don’t? Off the top of my head, things like assists, turnovers, 3s and FT attempts, corner 3s, and other seemingly important things do not show up at the top of the correlation list. So what’s at the bottom? What doesn't really matter?

This is quite a list. These are things that don’t correlate positively or negatively to winning:
  1. It doesn’t seem to matter where you shoot. At all. You just have to shoot well. 
  2. Likewise it doesn’t matter if your opponent is taking 2s or 3s as long as they aren’t making them.
  3. Steals and generating turnovers don’t matter much. Not intuitive for a category with such wide standard deviations (
  4. Shockingly, generated free throws and making a lot of them… don’t matter?
  5. Pace doesn’t help or hurt.

Multi Variable Analysis

After going through the simple correlation, I went a step further and used Excel for some basic multi-variable regression. Although correlation explains how wins fluctuate with individual stats, many of these stats correlate with one another – for example, the TS% problem, which is am amalgamation of 2P%, 3P% and FT% as well as %3P (% of 3 pointers taken), %Corner 3PA, FTr, and the like. Multi-variable regression takes all of these into account and attempts to determine which individual component has the greatest effect (to be entirely correct, which has the least likelihood of not having an effect). 

I first ran a regression against MOV. I sorted out the synthetic stats and ORtg / DRtg as I’m trying to drill down into components. I also removed attendance since I’m pretty sure that is a lagging metric. Finally, I removed PTS/G and Opp PTS/G – those two are basically proxies for MOV. This is the result:

MOV Multi-Variable Analysis

The trick with these types of regressions in knowing what categories to keep, since Excel can only do 16 at a time. The Adjusted R Squared was .867, not terrible. The stats outlined in red are the ones that really stood out in this sample set. It’s pretty simple: take care of the ball. Score at the rim and shoot good percentages from 3 and from the line. Protect the rim on the other side and disrupt passing lanes.

I was surprised to see that Opp FG% wasn’t higher after it ranked highly in the correlation tables. Overall, the P values explode after Opp Ast. This is where the x value set that you choose gets extremely important – things can get thrown off a lot just by including or excluding a particular test category.

True Shooting % and Defensive Field Goal %

I decided to break down the regression further. From the correlation list, we know that Opp FG% and TS% are the top defensive and offensive correlations to Wins, MOV, and Playoff Wins. So I ran two separate regressions to see which metrics these correlate to. 

TS% Multi-Variable Regression

The Adjusted R squared was an impressive .995 indicating a reasonably good fit. Consider this a win for statistical analysis (sort of – it only really counts if you believe TS% in the first place, I guess). Good offensive teams shoot well from everwhere: 2s, 3s, FTs, it doesn’t matter. But they also take a lot of freebies and 3s. In essence, good offensive teams look like the Rockets.

Things that don’t matter? I was shocked to see Pace at the very end. This makes is somewhat counterintuitive – shouldn’t a fast pace benefit stronger teams who can take advantage of their talent over a larger number of possessions? Don't good offensive teams push the pace for easy transition buckets? Maybe Pace may be affected by team depth (e.g. the LeBron Heat teams were good but lacked depth) or is just suited better to certain player types. Maybe it’s just that basketball is a complicated game and these regressions are not easy to understand. But next time you hear of your favorite team pushing the pace, you might want to wonder if that's a good thing.

I also assist metrics were curiously low. Most people understand a catch-and-shoot (which presumably leads to assists if the basket is good) are superior to other types of shots. But why don’t assist correlate higher to TS%? Again, complexity, but it may be that shot making takes precedence over passing prowess. 

I ran the regression one more time, removing the shooting %s and substituting shot location, to see if anything is different. Basically, I’m distilling this into a coaching question – we know that having great shooters helps, but with the players a team has, can it juice its offense by encouraging certain acts (like ceratin shot locations)? I know that this is woefully incomplete without the new stats such as passes/possession, hockey assists, anything from SportVu. But let’s try!

TS% Multi-Variable Regression 2

Unfortunately the Adjusted R Squared dropped to .539. The poor fit of the x values to the y values indicates that a huge part of offensive success is having good players (esp. good shooters). Sure, drawing fouls, shooting near the rim, and taking care of the ball are important, but this whole data set is not terribly valuable to the analysis. In this case, Pace may show an outsized effect on TS% in this cohort, but we’ve already proven that good shooting is more important to offensive and overall team success than pace. Just to prove this theory:

Look how weak these Pace correlations are. If anything, Pace reduces MOV slightly.

Not surprisingly, 2P% has a huge influence on TS% as the multi-variable regression revealed. 

Defensive Multi-Variable Regression

Lastly, let’s take a quick look at defense by running metrics against Opp FG%, again leaving out opponent shooting %s since those are largely determined by opponent skill:

The Adjusted R Squared was only .670, indicating again that the quality of your opponent’s players has a lot to do with defensive performance. It also indicates that this chart, like the previous one, isn’t terribly informative. Still we can glean some stuff: Protecting the rim is important, perhaps since the threat of the block dissuades shots near the rim. But you have to do so while avoiding fouls. I also chose to include a few offensive stats, like Pace and Offensive Rebound % (ORB%). It’s interesting (and intuitively sensible) that pushing the tempo and good for offensive boards will increase your opponent’s offensive efficiency. That’s why Pace has such a complicated relationship with team quality – it juices your offense, but also your opponents’.


This is pretty messy analysis and does not come near to being complete. Having the SportVu data helps immensely, especially on the player level. And we did confirm some things as leading to wins: Shooting good %s, getting to the rim, drawing fouls, and avoiding turnovers, while preventing those things on defense. Curiously, offensive rebounding doesn’t matter and neither does shot location – as long as you are converting good %s. And the analysis confirmed some old-school thinking, too: in the playoffs, pace goes out the window, defensive becomes more important and attacking/finishing at the rim is paramount to offensive success.

As far as applying this analysis to Kevin Love – his shooting ability near the rim and from 3 certainly help, but since defensive FG% is more important to playoff teams, he may be a slight net negative in that regard. But even in that analysis this data is woefully incomplete – for example, we don’t know how much one great shooter can juice an offense, or how much a subpar defender hurts the defense. We can infer that his gaudy rebounding figures (offensive and defensive) don’t really move the needle. The assists are nice, but that’s kind of what the data show Love to be – a nice player who can be a key starter on a competitive team, but isn’t “the guy.” Of course, old school evaluators who looked at his Minnesota teams’ records and labeled him as such didn’t need pages and pages of spreadsheets to know that. You make the call.

Email me if you’d like a copy of my Excel spreadsheet. Otherwise, go to Baketball-Reference to build your own!

#YMTCSports #YouMakeTheCalls

No comments:

Post a Comment