The ultimate guide to predictive college basketball analytics

You’re smarter than the typical college basketball fan. You hear about sports analytics and want to know more about how it applies to college basketball.

Maybe it’s November, and you’re looking to bet a few games as states continue to legalize sports betting. Maybe it’s March, and you need an edge in winning your NCAA tournament pool.

This article covers the basic concepts in college basketball analytics such as points per possession and the four factors. It does so by asking two questions:

  • How do you make a prediction for a game?
  • Do matchups matter in predicting a game?

The article ends with a curated list of further reading. This list covers the randomness of the three point shot, ensemble methods for making predictions and how to win your March Madness pool

Let’s dig in.

1. How to make predictions based on points per possession

Let’s suppose Gonzaga and Michigan State face off in the NCAA tournament championship game. You want to make a prediction for this game based on data.

The first thing to consider in making a prediction is the pace of play. Gonzaga likes to get up and down the floor, while Michigan State prefers the half court game.

Based on this style difference, it doesn’t make sense to look at points per game. For a more accurate estimate of team strength that considers pace, the analytics community looks at points per possession.

While it’s trivial to get the points in a game, the box score does not provide the number of possessions. Let’s look at how to estimate this number.

1a. Estimating possessions

To estimate the number of possessions in the game from the box score, let’s consider how a possession can end.

One way for a possession to end is a turnover (TO). The box score tracks each turnover, so each turnover counts as one possession.

Next, consider possessions with a field goal attempt (FGA). A possession ends after a field goal attempt in two situations:

  • the offense makes the shot
  • the offense misses the shot and the defense gets the rebound

The possession could get extended if the offense grabs the rebound (OREB). To account for these 3 situations, we estimate possessions with a field goal attempt by FGA – OREB. Offensive rebounds get subtracted because the offense can take two shots on one possession if they get an offensive rebound.

A possessions can also end with free throws. If every free throw came in a pair because of a shooting foul, the number of possessions would be half the free throw attempts (FTA). Unfortunately that’s not always the case, as the offense can:

  • get one free throw after a made basket
  • get three free throws when fouled on a three pointer
  • miss the front end of a one-and-one

Instead of a factor of a half, you need a different factor that accounts for these situations. Ken Pomeroy uses 0.475, and I use this factor in my calculations as well.

To estimate possessions (POSS) from the box score:

POSS = FGA – OREB + TO + (0.475 * FTA)

In college basketball, teams could differ in the number of possessions by two if one team gets an extra possession each half. To estimate the possessions in one game, you apply this a formula to both teams and take the average.

During the 2019-20 season, college basketball teams averaged about 70 possessions per game. Gonzaga was one of the faster teams, as they averaged 74 possessions per game. Michigan State played an average pace of 70 possessions.

It’s possible to get a more accurate possession count through play by play data. This also presents an interesting opportunity to eliminate possessions at the end of games with intentional fouls, as they don’t reflect the normal flow of play.

1b. Ranking college basketball teams

To evaluate college basketball teams on offense and defense, we use points per possession as an efficiency metric. The last section showed how to estimate possessions from the box score.

During the 2019-20 season, college basketball teams averaged 100.5 points per 100 possessions. The offense gets about a point each time they have the ball.

To get from points per possession to college basketball rankings, you need to adjust for strength of schedule. College basketball has a wide range of teams, and every fan recognizes the difference in playing Michigan State versus Mississippi Valley State.

There are many ways to adjust for strength of schedule. Here, I’ll focus on the methods of Ken Pomeroy, as his college basketball rankings are the most widely known.

To adjust for strength of schedule, Pomeroy uses the least squares method. This is also the basic idea behind linear regression, the data science technique most often used to find the correlation between two variables. For a visual primer on regression, click here.

This least squares method also drives the team rankings on the Sports Reference sites. They call it the Simple Rating System (SRS), and this method assigns a rating to each team in college basketball. The difference in the rating between two teams gives a prediction for a future game.

To perform this calculation, the computer changes the ratings for all 353 teams until the ratings meet a criteria. This criteria is that these ratings minimize the error between the prediction from the ratings and actual game results.

Pomeroy takes it one step further as he considers offense and defense for each college basketball team. Instead of 353 variables, his code changes 706 variables to minimize the error to the efficiency by points per possession in games.

Since these variables get solved for simultaneously, the offensive rating for Gonzaga depends on 705 other offensive and defensive ratings. Michigan State’s defensive rating matters to Gonzaga’s offense, even if Gonzaga and Michigan State have yet to play.

In his calculations, Pomeroy puts more weight on recent games. After performing these least squares calculation, you get the offensive and defensive rankings on kenpom.com. These two numbers get combined into his team rankings.

Now let’s see how these adjusted efficiency numbers imply a prediction for a game.

1c. Making a prediction

With offensive and defensive ratings based on points per possession, we can now make predictions for games. Let’s use Gonzaga against Michigan State as an example.

First, consider what the offensive and defensive ratings mean. For example, if Gonzaga has a rating of 115 points per 100 possessions, then they are expected to score 115 points per 100 possessions against an average college basketball defense.

As another example, Michigan State might have a defensive rating of 90 points per 100 possessions. This means that they’re expected to allow 90 points per 100 possessions against an average college basketball offense.

To make a prediction between Gonzaga’s offense and Michigan State’s defense, you have to consider that Michigan State’s defense is much better than average.

To do this, consider the deviation of a team’s rating from average. To simplify the math, let’s use an average efficiency of 100 points per 100 possessions.

Gonzaga’s offense is 15 points better than college basketball average, but Michigan State’s defense is 10 points better than average, both per 100 possessions. Better defenses have lower ratings.

A common way to make a prediction is that Gonzaga’s offense will score 5 points per 100 possessions better than average. This is because 15 (Gonzaga’s deviation from average on offense) minus 10 (Michigan State’s deviation from average on defense) is 5. This is the same method I use with yards per play and success rate in football predictions at The Power Rank.

Gonzaga is predicted to score 105 points per 100 possessions against Michigan State. If you scale this efficiency to 70 possessions for a game, this implies Gonzaga will score 73.5 points.

You can do same calculation for the other matchup. Suppose Michigan State’s offense has a rating of 111 while Gonzaga’s defense has rating of 93 (both measured by points per 100 possessions). You can work out that Michigan State’s offense is predicted to be 4 points better per 100 possessions. This implies 72.8 points in a game with 70 possessions.

Based on these hypothetical numbers, Gonzaga would be predicted to win by 0.7 points.

While I have assumed 70 possessions in this game, you could assume a different number, especially if Gonzaga plays faster than average. With this method, there is clear freedom to adjust for pace.

Get my March Madness cheat sheet

At The Power Rank, I use data and analytics to make better football and March Madness predictions.

If you sign up for my free email newsletter, you’ll get:

  • my March Madness cheat sheet that makes it drop dead easy to fill out your bracket
  • a sample of my best football and college basketball predictions usually saved for paying members of The Power Rank
  • updates on content like this guide to college basketball analytics

To sign up, enter your best email and click on “Sign up now!”








2. Do matchups matter?

During the 2019-20 season, West Virginia was an elite offensive rebounding team. In contrast, Texas was an awful at defensive rebounding, worst in the Big 12.

When West Virginia plays Texas, do they have an edge due to this matchup? Can we use this to make a better prediction?

Jordan Sperber of Hoop Vision has done some excellent work on matchups. To understand his results, let’s look at the four factors of basketball, which provides a quantitative method to look at matchups.

2a. Four factors

Dean Oliver was a pioneer in basketball analytics. In 2003, he first published his book Basketball on Paper that laid the groundwork for future work in basketball analytics.

In the book, he wondered what factors made an offense great. Shooting is an obvious asset, but what else matters? Oliver wrote down four factors:

  1. shooting
  2. offensive rebounding
  3. turnovers
  4. getting to the foul line

Let’s examine these four factors in more detail and how to define a rate statistic for each.

The first of the four factors is shooting, as an offense can’t score without making baskets. The most simple measure of shooting is field goal percentage, or field goals made divided by field goal attempts.

A better formula for shooting gives the offense more credit for a three point shot. Effective field goal percentage assigns this extra 50% credit for a three. In college basketball, the average effective field goal percentage is about 50%.

The second factor is offensive rebounding, as the offense keeps a possession alive with an offensive rebound. Total offensive rebounds is not a good measure though, as this depends on the shooting accuracy of the opponent.

Instead, consider the offensive rebounding rate, or the fraction of rebounds the offense gets on that end of the court. This offensive rebounding rate is offensive rebounds divided by the sum of offensive rebounds plus the opponent’s defensive rebounds.

In college basketball, the average offensive rebounding rate is about 28%. Since the defense grabs the other rebounds, the defensive rebounding rate is 1 minus the opponent’s offensive rebounding rate.

The third factors is turnovers. A team can’t score if they commit a turnover before taking a shot.

To measure turnovers, consider turnover rate, or turnovers divided by possessions as estimated from the box score. On average, college basketball teams turn over the ball on about 19% of possessions.

The final factor is getting to the foul line. Since college basketball averages a 70% free throws percentage, taking two free throw attempts is an efficient means to score points.

To measuring getting to the foul line, one metric is free throw attempts divided by field goal attempts. In college basketball, this rate is about 32%.

It’s also reasonable to define this factor as free throw made divided by field goal attempts. This definition includes the ability to make free throws in addition to getting to the foul line. However, I’ll use free throw attempts to isolate the ability to get to the foul line.

Oliver’s four factors explain offensive efficiency, or points per possession, almost exactly. To explain this, I’ve run a linear regression on the four factors to explain points per possession on the team level. This process assigns a weight to each of the four factors.

When you do this analysis for college basketball, the four factors explain 98% of the variance in offensive efficiency.

Based on this regression analysis, which of the four factors is the most important? Shooting is the most important of the four factors, not any kind of surprise. Offensive rebounding and turnovers have about the same importance but less than shooting. The least important factor is getting to the foul line.

2b. Extremes in matchups

With these four factors, let’s get back into the question of whether matchups matter in making predictions. In 2013, Jordan Sperber wondered whether a team that excelled in one of the four factors would have an advantage over an opponent weak in the opposing factor.

In particular, can you make a better predictions based on West Virginia’s excellence in offensive rebounding and Texas’s weakness at defensive rebounding?

To study this, Sperber isolated games in which teams had extremes in rebounding. He defined an extreme as a team in the top or bottom 10% in offensive or defensive rebounding rate.

With both elite and awful units, there are four types of games:

  • an elite offense versus an elite defense
  • an elite offense versus an awful defense
  • an awful offense versus an elite defense
  • an awful offense versus an awful defense.

Sperber isolated games with these matchups and asked how well adjusted offensive and defensive efficiency can make a prediction in each game, as discussed in the previous section. He compared this prediction with the actual efficiency in the game.

For example, his data set had 311 games with an elite offensive rebounding team versus an awful defensive rebounding team. In looking at the efficiency prediction versus what actually game efficiency, the average difference was less than a point per 100 possessions.

The prediction based on offensive and defensive efficiency was able to explain the outcome of these games. Here is the main result from his study: Sperber found the same predictive accuracy in all four types of matchups.

He repeated the study on the other three factors and found the same result. The efficiency prediction was equally accurate in each of the four types of matchups.

Here is the take home message: team level matchups in the four factors do not help in predicting the outcome of a college basketball game. In some sense, the matchup is already considered in the efficiency numbers. You can see this from how well the four factors explain efficiency.

Don’t extrapolate these results too far. If your team plays a six foot three inch center, he’s probably going to get killed by Joel Embiid. However, based on the four factors, matchups do not help you make better predictions. Offensive and defensive efficiency by adjusted points per possession does an excellent job.

3. Further reading

Need more college basketball analytics? Check out these resources for further reading.

3a. The three point shot

The three point shot is a powerful weapon. It gives the underdog an opportunity to get hot and pull off the upset. It has also propelled a favorite like Villanova to two NCAA tournament championships.

Ken Pomeroy wondered whether the offense or defense has control over the three point shot. To study this, he looked at the correlation from early to late season statistics in conference.

He found that the defense has the ability to control what type of shots an opposing offense takes. Defenses can limit the fraction of shots an opponent takes from three.

However, the defense has no control once the offense puts up a three point shot. Randomness plays a big role in determining three point percentage allowed.

Even more surprising, randomness also plays a big role in an offense’s three point percentage. While shooting is a clear skill, the data shows regression to the mean in three point percentage.

To read Pomeroy’s article on the three point lottery, click here.

3b. Ensemble methods to predict the tournament

Nate Silver has published predictions for the NCAA tournament at both the New York Times and his own site FiveThirtyEight. The key to accurate predictions are ensemble methods that combine many predictors.

First, Silver combines 6 different power ratings to get an estimate of team strength. Each system has its weaknesses, but the combination provides a powerful predictor.

In addition, Silver adds an unexpected component: the preseason AP poll. It might seem strange to add a predictor that has no access to data from the current season.

However, the preseason AP poll is a powerful predictor of tournament performance that harnesses the wisdom of crowds. No one sports writer submits the perfect ballot, but the combination of many sports writers gives an accurate assessment of team strength.

In fact, my article on FiveThirtyEight has shown that the preseason polls are a better predictor than the RPI, the outdated computational method the selection committee previously used to seed the field.

An earlier article on these NCAA tournament predictions inspired the ensemble approach I use for my member predictions at The Power Rank. This includes college football and the NFL in addition to college basketball.

To check out the methods behind Nate Silver’s NCAA tournament predictions, click here.

3c. How to win your March Madness pool

Armed with analytics and win probabilities, you’re ready to win your March Madness pool. However, you should not simply pick the higher ranked team in every game.

This favorites strategy gives you the highest win probability for small pools. But in some years, there is a more optimal strategy for intermediate sized pools.

Sometimes, the public gets overexcited about a team, such as the 2015 Kentucky team that entered the tournament undefeated. As the numbers suggest, suppose you also pick this team as champion.

If this favorite wins, you and many others will get the 32 points for picking the correct champion. With so many other people in contention, it’s likely someone will get lucky in earlier rounds and beat you.

Instead, you should make a contrarian choice of a different team with a high win probability but not getting picked in many pools. If this team wins, you have a great chance to win your pool.

I explain these ideas in my book How to Win Your NCAA Tournament Pool. I’ve posted the Introduction here at The Power Rank. To check out the entire book for less than the cost of a latte, click here.

Get my March Madness cheat sheet

At The Power Rank, I use data and analytics to make better football and March Madness predictions.

If you sign up for my free email newsletter, you’ll get:

  • my March Madness cheat sheet that makes it drop dead easy to fill out your bracket
  • a sample of my best football and college basketball predictions usually saved for paying members of The Power Rank
  • updates on content like this guide to college basketball analytics

To sign up, enter your best email and click on “Sign up now!”