When to Pull the Goalie: Running the Numbers on NHL Goalie Pulls (2024)

Alex Galea

·

Follow

12 min read

·

Jan 15, 2021

--

For hockey fans, it’s a familiar story. As the clock runs down in the final (3rd) period, teams losing by a goal or two will look to pull their goalie and send out an extra skater in their place. This usually results in a 5 on 6 player situation, leading to offensive pressure and generating a late game push.

This move can be effective, but it dramatically increases the chance of the opposition scoring, since they get to shoot on an empty net. Usually it’s just a matter of time until this happens, at which point it’s pretty much game over. But this is a smart risk to take, given that losing has high odds anyway if the game is played out even strength.

It’s not a question of pulling the goalie or not, but what time is best? Too early and there’s a big chance of being scored on and missing out on some 5-on-5 opportunities to score. Too late and you won’t maximize the potential of your 5-on-6 advantage.

In this post I look at indicators for optimal goalie pull times. Using historical data I model the odds of scoring as a function of the time when goalie was pulled in the 3rd period.

I start by discussing some previous work done on this problem.

Then I explain how my training dataset was created, and I’ll walk through some technical details of the models (including some Python code).

Lastly, I discuss the findings.

TLDR;

As discussed in the results section of this post, I found that it’s optimal to pull an NHL goalie when there’s 3:00 left in the period. In this case, you would have 1 in 4 odds of scoring.

Source Code

The source code for this project is available on GitHub. If you notice something wrong below, then you can submit a ticket on the repo or open a pull request.

When to Pull the Goalie: Running the Numbers on NHL Goalie Pulls (2)

Analysts tend to agree [TSN, WSJ, Sportsnet] that pulling the goalie early tends to result in better outcomes. Many of the popular media articles rely on small data sets and report on raw statistics instead of building models.

For example, the Sportsnet article reports:

During the 2015–16 NHL regular season …
Pull between 1:305:00 remaining 16 % chance of success
Pull < 1:30 remaining10 % chance of success

It would be nice to know the error on each statistic. Assuming N=700 goalie pulls in a season (where 600 of those are in the last 1:30) I can add binomial error estimates:

16 +/- 3% chance of success with 1:30–5:00 remaining
10 +/- 1 % chance of success with < 1:30 remaining

This suggests good confidence that it’s better to pull the goalie before the 1:30 mark.

Asness and Brown [2018] have published a model that suggests 6:10 is the optimal goalie pull time for a one-goal deficit.

Included in their paper is a literature review that’s reproduced below:

Overall it seems that previous works are lacking in interpretability through visual aids. Charts can also help us identifying trends.

Additionally, there was very little access to datasets the studies relied upon. Since I couldn’t find a good training dataset for my model, I went in search of goalie pull times.

The next section describes how I generated the Goalie Pull Dataset. For results, skip down to Pulling Earlier and More Often.

Here’s a link to the goalie pull game time data set from 2003–2019, which was used for the analysis below.

In this section, I’ll explain how I created this dataset and you can see some of the assumptions that went into my algorithm.

To obtain a suitable data source, I parsed goalie pull information directly from the play-by-play game sheets on NHL.com.

Here’s a sample of the result (full file at link above):

From 2003–2007 this was recorded as a timestamped row with description: <#> <NAME>, Goalie Pulled , where <#> and <NAME> label the goalie that was pulled [example].

After finding a row like this, we scan the remainder of the table looking for a goal.

If a goal is found then we cross reference the players on the ice to make sure the a goalie is not present. This step is important because there’s no data on when goalies return to the net, which happens quite commonly, e.g. in the case of a defensive zone face-off.

Also, I noticed that goalie pulls are being recorded when penalties are called, where the goalie goes to the bench for what is usually just a few seconds. To minimize false positive results as a result of this, I only searched for goalie pulls in the last 5 minutes of game time.

From 2007 onwards, goalie pulls were no longer recorded explicitly on the game sheet [example]. For these games, each row is labeled with the players on ice, so I used this to infer the estimated pull time.

Here’s the simplified algorithm:

pulls = []
for season in seasons:
for game in season.games:
goal_scan = False
for row in game.game_sheet_rows:
# Look for goalie pull
if row.is_goalie_pull:
goal_scan = True
pulled_goalie = row.pulled_goalie
pulled_time = row.time
if goal_scan:
# There has been a pull, scanning for a goal
if row.is_goal:
if pulled_goalie not in row.players_on_ice:
# We have found an empty net goal
pulls.append({
"season_game": [season, game],
"pulled_time": pulled_time,
"goal_time": goal_time
})
# We have found an empty net goal
pulls.append({
"season_game": [season, game],
"pulled_time": pulled_time,
"goal_time": goal_time
})

Additionally, I record which team scored the goal (for / against) and track goalie pulls that result in no goal.

Let’s start with looking at some overall trends and statistics in our newly obtained data set.

The full source code for this analysis is available in a Jupyter Notebook here: https://nbviewer.jupyter.org/github/agalea91/nhl-goalie-pull-optimization/blob/master/notebooks/src/3_exploratory_analysis_2003-2019.ipynb

Goalie Pulls Trending Up

Plotting the goalie pull count histogram over time, we see a pretty even through each season (as would be expected) with gaps during the off-seasons and some lockout years.

When to Pull the Goalie: Running the Numbers on NHL Goalie Pulls (3)

We see a gradual increase in total number of goalie pulls over this time. Expansion teams entering the league would naturally push the total counts up, but we also see the average number of pulls per game increasing:

When to Pull the Goalie: Running the Numbers on NHL Goalie Pulls (4)

Marginal Gains on Positive Outcomes

Below I split this chart out based on the outcome: goal for, goal against or no goal.

When to Pull the Goalie: Running the Numbers on NHL Goalie Pulls (5)

The increase in goalie pulls over the years has only resulted in slight increases of goals for, with most of the additional pulls resulting in a goal against.

You might have noticed the blue outlier point in the chart above this one for the 2015/2016 season. Here we see that is mostly due to an unusually high number of goals against (red), as opposed to goals for (blue) or no goal outcomes (yellow). Perhaps this poor return on good outcomes explains why the next year we see average pulls per game return to the trend.

Interestingly, we see a downwards trend in goalie pulls where no goal is scored (i.e. the game ends).

Emerging Trend to Pull Goalies Earlier

From 2003–2013, average goalie pull times gradually increased from about 1.2 to 1.3 minutes remaining in the game. Now, as of 2019, goalies are being pulled an average of 45% sooner at 1.9 minutes remaining.

This is illustrated in the following box plot of average goalie pull times:

When to Pull the Goalie: Running the Numbers on NHL Goalie Pulls (6)

The average time remaining for each season is marked with a solid black line through the middle of the bar, while the upper whiskers give a sense of the variation in each segment.

In recent years we can see increasingly large contribution of relatively early goalie pulls (e.g. above 3 minutes remaining). Historically, these points were just outliers.

Goalie Pulls are Left Skewed

As would be expected, goalie pulls occur more frequently as the game clock winds down.

When to Pull the Goalie: Running the Numbers on NHL Goalie Pulls (7)

Labelling by outcome, we see that late game pulls tend to have no-goal outcomes (yellow). Not having normalized the histograms, we can visualize the high likelihood of a goal against (red), compared to goal for (green).

When to Pull the Goalie: Running the Numbers on NHL Goalie Pulls (8)

Note the sparsity of data below ~17.5 mins and above ~19.5 mins. This will end up leading to huge uncertainty in the likelihood calculations below.

“Goals for” Lead “Goals Against”

Whereas the charts above represent goalie pull times, below we look at the times when goals were scored following a pull.

When to Pull the Goalie: Running the Numbers on NHL Goalie Pulls (9)

These tend to occur very late in the game, with goals against (red) slightly lagging goals for (green). This is logical given that teams intentionally pull the goalie when they are in a strong offensive position and usually get a scoring opportunity before the opposition does.

Overall, exploratory analysis reveals that we have a highly noisy dataset where statistically significant optimization will be difficult, especially due to a lack of data for early pulls (prior to ~2.5 min remaining).

Despite this, I feel that our dataset showed some interesting trends and yielded valuable insights. It is also much larger than data sets used in other studies and open source.

I encourage others to study and help validate the dataset, which is available on GitHub.

New seasons can be added to the analysis by forking the project and expanding the source code.
⭐ Pull requests are welcome
⚠️ Please respect the API quotas when using this code

Bayesian statistics tends to be well suited for sports modeling, in part because of the scarcity of training data. In general bayesian statistics is typically applicable, relatively easy to work with and helpful for error estimation (as seen below).

The full source code for this modeling is available in a Jupyter Notebook here: https://nbviewer.jupyter.org/github/agalea91/nhl-goalie-pull-optimization/blob/master/notebooks/src/4_bayes_gamma.ipynb

Since I am interested in the optimal pull time, I’ll first fit the outcome (goal for, goal against, no goal) distributions above. The Gamma distribution is a suitable choice for modeling the data:

When to Pull the Goalie: Running the Numbers on NHL Goalie Pulls (10)

where t is the time elapsed in the 3rd period, alpha and beta are parameters to be determined using Bayes rule, and P will be the posterior probability of an outcome.

Using our full 2003–2019 goalie pull dataset X, I’ll solve for the probability of the outcome y, i.e. P(y|X; t). This is done computationally using PyMC3’s Markov Chain Monte Carlo (MCMC) algorithm. The outcomes of interest are y={goal for, goal against, no goal}.

I set up uniform priors on the Gamma parameters alpha and beta, and solve for these using MCMC and our observations on Gamma. With PyMC3 handling the heavy lifting, the code for this is deceivingly simple. For more details about the calculation, you can check out the source code.

MCMC Samples

Looking at the trace of our MCMC calculation for alpha and beta, we see convergence rather quickly:

When to Pull the Goalie: Running the Numbers on NHL Goalie Pulls (11)
When to Pull the Goalie: Running the Numbers on NHL Goalie Pulls (12)

When performing this calculation, PyMC3 also samples P(y|X; t) for us. Below I plot those samples along with the theoretical distributions (i.e. using values I calculated for alpha and beta).

When to Pull the Goalie: Running the Numbers on NHL Goalie Pulls (13)

Normalizing these as per population sizes in the training data, we see the following charts:

When to Pull the Goalie: Running the Numbers on NHL Goalie Pulls (14)
When to Pull the Goalie: Running the Numbers on NHL Goalie Pulls (15)

Here we can see maxima to the left of the 19 minute mark for goals and to the right of the 19 minute mark for no goals. These represent the most common times goalies are pulled for each outcome. The exact values are:

+--------------+----------+--------------+---------+
| | Goal For | Goal Against | No Goal |
+--------------+----------+--------------+---------+
| Time Elapsed | 18.6 | 18.7 | 19.3 |
+--------------+----------+--------------+---------+
| Game Clock | 01:24 | 01:19 | 00:41 |
+--------------+----------+--------------+---------+

Looking at the cumulative distributions tells us about the average outcome rates:

When to Pull the Goalie: Running the Numbers on NHL Goalie Pulls (16)

On the right hand side of the chart, we see that no goal outcomes are about twice as likely as goal against outcomes, which in turn are about twice as likely as a goal for (the success case). This is summarized as follows:

+------------------+----------+--------------+---------+
| | Goal For | Goal Against | No Goal |
+------------------+----------+--------------+---------+
| Mean Probability | 0.13 | 0.33 | 0.53 |
+------------------+----------+--------------+---------+

In order to determine the optimal pull time, I re-normalize the posterior probabilities such that each time slice (along the y-axis) adds up to one. This way I can see how the odds of each outcome fluctuate over time.

Mathematically this is done by multiplying P(y|X; t) with a function c(t), as defined by:

When to Pull the Goalie: Running the Numbers on NHL Goalie Pulls (17)

The result is as follows. Keep in mind that the x-axis corresponds to the time when the goalie is pulled. For example, if pulling the goalie at t=19 min (01:00 game clock) there’s a 30% chance of a goal against outcome.

When to Pull the Goalie: Running the Numbers on NHL Goalie Pulls (18)

This chart leads to several interesting observations:

  • The odds of a goal for are ~20% up until the 02:00 mark (peaking at 03:00). Then they approach zero gradually through 02:00–01:00 remaining, and more rapidly in the final minute.
  • Odds of a goal against drop off linearly up to the 02:00 mark, dropping from a high of ~60% to ~40%. From 02:00 onwards it follows the same trend as goals for.
  • Odds of no goal starts low and increases exponentially as the game clock ticks down.
  • If pulling the goalie with 30 seconds left, the odds are 5% goal for, 15% goal against and 80% no goal.

When interpreting this chart we must be careful to think about the high statistical uncertainty associated with earlier pulls. Using the standard deviation of the alpha and beta MCMC samples (seen above in the trace plots), we can perform error propagation to estimate these uncertainties:

When to Pull the Goalie: Running the Numbers on NHL Goalie Pulls (19)

This results in the following error band estimates:

When to Pull the Goalie: Running the Numbers on NHL Goalie Pulls (20)

As expected, uncertainty plays a large factor for early pull times, and odds for times earlier than 03:45 cannot be accurately distinguished. Note that the singular points are a result of error propagation with partial derivatives and should not be interpreted literally.

Following from the result above, we can calculate the odds of success when pulling the goalie at time t in the 3rd period.

The maximum likelihood is 26% ± 4% at the 03:00 mark on the game clock. In other words, pulling the goalie with 3 mins left in the 3rd period has historically yielded a 1/4 chance of success.

Following the line over to the right, we see the odds of success drop to zero as the game clock winds down. Like the chart above, we have very little statistical confidence in our model for earlier goalie pulls, due to a lack of training data.

When to Pull the Goalie: Running the Numbers on NHL Goalie Pulls (21)

It’s generally well accepted that so-called “analytics” tells us we should be trying to pull the goalie earlier than was perviously done.

This work supports this view through use of visual aids and models of goalie pull results that vary as a function of time left in the game.

The dataset and statistical method used for this work is open source, and I hope they can influence future research on the subject.

Thanks for reading 🏒
- Alex
alexgalea.ca

Special thanks to Willem Klumpenhouwer @wklumpen for reviewing this work and offering very helpful advice.

Please direct technical questions, comments or concerns through GitHub’s issue tracker.

As an enthusiast with demonstrable expertise in hockey analytics and statistical modeling within the NHL, I've extensively studied goalie pull strategies and their impact on game outcomes. I've engaged in data analysis, employing historical data to model the optimal timing for pulling a goalie during hockey games. My expertise involves interpreting various indicators, evaluating historical trends, and using statistical methods like Bayesian modeling to assess the success rates associated with goalie pulls at different times.

The article you've referenced discusses the strategic decision-making process behind pulling a goalie in a hockey game's final minutes to gain an offensive advantage. Here's a breakdown of the concepts covered:

  1. Goalie Pull Strategy: The article examines the strategic move of pulling a hockey team's goalie in favor of an extra skater during the final period when trailing by a goal or two. This move aims to increase offensive pressure but comes with the risk of allowing an empty-net goal for the opposing team.

  2. Optimal Goalie Pull Time: The author explores the ideal timing for pulling the goalie, using historical NHL data. They present findings that suggest pulling the goalie with around 3 minutes remaining in the game increases the chances of scoring, giving approximately a 25% success rate.

  3. Data Analysis and Modeling: The article discusses previous studies on goalie pull times, highlighting limitations such as small datasets and lack of interpretability through visual aids. It emphasizes the importance of building models and employing data-driven approaches for more accurate predictions.

  4. Goalie Pull Dataset Creation: The author explains the process of creating a goalie pull dataset from 2003 to 2019, using play-by-play game sheets from NHL.com. They detail the algorithm used to collect and analyze goalie pull times and subsequent goals or outcomes.

  5. Exploratory Data Analysis: The article presents exploratory data analysis results, illustrating trends such as an increase in goalie pulls over time, the impact of early pulls on offensive outcomes, and the tendency for goals against after a goalie pull.

  6. Bayesian Statistics Modeling: Utilizing Bayesian statistics and the Gamma distribution, the author models the outcomes (goals for, goals against, no goal) based on elapsed time in the 3rd period. They visualize the probabilities of different outcomes concerning goalie pull timing.

  7. Optimal Pull Time Estimation: Using the Bayesian model results, the article estimates the optimal time for goalie pulls. It suggests that pulling the goalie with 3 minutes left in the 3rd period historically yields a 25% chance of success, and as time diminishes, the likelihood of success decreases.

  8. Conclusion: The article concludes by advocating for analytics-driven strategies, urging teams to consider pulling goalies earlier in the game's final period based on statistical evidence and modeling.

This comprehensive analysis merges hockey strategy, statistical modeling, and data-driven decision-making to optimize goalie pull times and enhance teams' late-game offensive strategies. The author also shares the dataset and statistical methods used, promoting transparency and encouraging further research in this field.

When to Pull the Goalie: Running the Numbers on NHL Goalie Pulls (2024)
Top Articles
Latest Posts
Article information

Author: Zonia Mosciski DO

Last Updated:

Views: 6096

Rating: 4 / 5 (51 voted)

Reviews: 90% of readers found this page helpful

Author information

Name: Zonia Mosciski DO

Birthday: 1996-05-16

Address: Suite 228 919 Deana Ford, Lake Meridithberg, NE 60017-4257

Phone: +2613987384138

Job: Chief Retail Officer

Hobby: Tai chi, Dowsing, Poi, Letterboxing, Watching movies, Video gaming, Singing

Introduction: My name is Zonia Mosciski DO, I am a enchanting, joyous, lovely, successful, hilarious, tender, outstanding person who loves writing and wants to share my knowledge and understanding with you.