The Reason You Can’t Avoid The Curse of Small Sample Size

immaculate_deflectionIn 2009, the Denver Broncos got off to a magical start. Under new coach Josh McDaniels, they won their first 6 games. On ESPN, commentator Tom Jackson called McDaniels “one of the great ones.”

I still remember the shock I felt listening to Jackson ignore the small sample size of 6 games. He might as well have cut a hole in my skull and spit on my brain.

This was probably before the time one could say “small sample size” on ESPN. But Jackson should have seen the warning signs.

In the first game of the season, Denver faced a 7-6 deficit late in the fourth quarter, as the offense could only muster two field goals against Cincinnati. From their own 13 yard line, Denver’s Kyle Orton threw a pass up the left sideline.

The pass got tipped by a defender before landing in the arms of Denver’s Brandon Stokley, who rumbled into the endzone for a go ahead touchdown. The play became immortalized as the Immaculate Deflection.

Orton to Stokley. Maybe Jackson thought this duo would become the new Montana to Rice when he proclaimed McDaniels as one of the great ones.

After their 6-0 start, the Denver Broncos finished the season 2-8 in 2009. They missed the playoffs when they lost to Kansas City Chiefs, a 4-12 team, the last week of the season.

Denver started the 2010 season 3-9. Coach McDaniels got fired.

As a smart football fan, you know better than to make decisions based on a small sample size. But have you ever wondered why humans get fooled by randomness?

In this article, I discuss the brain science behind our uneasy relationship with randomness. The trait that makes us human can fool us when dealing with uncertainty.

But first, let’s show the fallacy of judging based on small sample size with a simple experiment.

The flipping of a fair coin

Randomness plays a big role in football. Sometimes, a tipped pass lands in the hands of the receiver who scores a winning touchdown. Other times, the ball lands in the hands of a defender, ending the game.

The outcome of a football game is not that different from the flipping of a coin. To show the fallacy of judging based on small sample size, let’s use randomness to model the wins and losses of Coach Average.

The visuals shows the results for Coach Average in his first fifty games based on a 50% chance to win each game.

actual_random_sequence

Coach Average gets off to a hot start. In his first college season, he goes 10-3 and becomes the savior of the program.

However, there is no coaching acumen in our randomness experiment. For each of those first 13 games, there was an equal chance for a win or loss. Because of small sample size, you shouldn’t judge Coach Average based on his 77% win percentage in 13 games.

With more games, we start to see the true skill or lack thereof for Coach Average. He goes 4-8 in his second and third seasons. After 50 games, he has 46% win percentage, close to the expected 50%.

The convergence of this win percentage towards 50% with an increasing number of games is the Law of Large Numbers, the mathematical reason you should never make a judgment based on small sample size.

The human desire for patterns

The reason humans get fooled by small sample size is that we see patterns in randomness.

Coach Average’s record was produced from one of the randomness demonstrations I’ve done at the Summers-Knoll School in Ann Arbor, Michigan. I put a black and a white chess piece in a bag, and the students draw one of the pieces from the bag.

Before we do the randomness demo, I ask the students what they expect to see in a sequence of pieces pulled from the bag. Here are the typical results from a group of 7th and 8th grade students.

predicted_random_sequence

Humans expect a random sequence to alternate between wins and losses more than it actually does. The longest consecutive streak of the same color was 6, predicted by two of 12 students. In reality, randomness will produce more streaks than expected.

Humans have an uncomfortable relationship with randomness because we are wired to see patterns, as Daniel Bor explains in his book The Ravenous Brain. It starts with counting by 2 and 5 in the earliest years of grade school and culminates in the technology that puts a computer in our pocket. Humans have this remarkable ability to find patterns even though we can only hold 4 items in our working memory, the same number as a monkey.

Bor notes that this desire for patterns extends further than any survival instinct.

And we really are a decidedly strange species for actively seeking out games with patterns in them, when such activities seem to serve no biological function whatsoever, at least not in any direct way. It’s as if we were addicted to searching for and spotting structures of information, and if we do not exercise this yearning in our normal daily lives, we then experience a deep pleasure in artificially finding them.

This sounds like sports to me.

However, this ravenous desire for patterns gets us in trouble when we look at random sequences. As an example, the students picked 9 straight white chess pieces starting at the 19th trial. They thought it was a magic trick. Humans not only find patterns but then tell stories to explain them.

This pattern searching also gets us in trouble with sports. Let me show you a drastic example.

Ohio State in 2015

The Buckeyes entered the 2015 as the nation’s consensus number one team. They had won the first College Football Playoff the previous year and had so many returning starters they needed to pick among three quarterbacks.

In late November, Michigan State came to Columbus to play Ohio State, and it looked like an easy win for the Buckeyes. Connor Cook, the best quarterback in Michigan State history, would not play due to an injury.

The markets closed with Ohio State as a 14.5 point favorite. This wasn’t far from the 14 point spread the markets set for this game in June.

Then the unthinkable happened. Ohio State played an awful game. The offense never tested a weak Michigan State secondary and insisted on running into a strong front seven. Michigan State won 17-14 on a late field goal.

The next week, Ohio State traveled to Ann Arbor to play Michigan. The markets closed with Ohio State as a 1.5 point favorite, a drastic change from the 16 point spread set in June.

There were reasons to deviate from the preseason spread. Michigan had shown improvement under new coach Jim Harbaugh. Ohio State had their struggles, like a 20-13 home win over Northern Illinois. However, a spread of 1.5 in favor of Ohio State made no sense and seemed like an insane reaction to one game against Michigan State.

Ohio State beat Michigan 42-13.

The Light Detection Experiment

The human relationship with randomness gets worse.

In Bor’s book, he discusses a light detection experiment in which participants are asked to predict whether the left or right light will flash. The left light flashes 80% of the time at random, while the right light flashes the other times.

The optimal strategy requires picking the left light each time, which leads to an 80% prediction accuracy. Rats figure this out.

However, humans do not follow this optimal strategy. We guess at the randomness, picking the left 80% of the time. This suboptimal strategy reduces the accuracy by 12%.

I find this experiment horrifying. Suppose you had a system that picked 56% of winners against the spread. With this win rate and proper bankroll management, you can grow your wealth at an impressive rate.

However, the light detection experiment shows we can second guess this winning system. We get greedy and want to win every game. At the extreme, let’s assume you bet your system 56% of the time at random, choosing to go against it the other times. You would win at a 50.8% rate, an unprofitable winning percentage.

On a more real level, consider these simulated wins and losses for a 56% win rate.

bettor_56winrate

Do you have faith in the system after it goes 2-8 starting on game 16? Or do you make up a story to explain this losing stretch? Even with our innate desire for patterns, try your best to avoid finding false explanations for randomness.