Ice hockey analytics have improved by leaps and bounds in recent years. Many of the best new statistics (like Corsi and Fenwick) measure different kinds of shot attempts. Another newer statistic, PDO, is supposedly a measure of luck.
PDO was originally introduced by Brian King on a now-defunct blog. It doesn’t stand for anything: it was an online screen name of King’s. A team’s PDO is defined to be their shooting percentage plus their save percentage. Usually PDO counts shots when neither team is shorthanded, i.e., in 5v5 situations. A player’s individual PDO is the sum of his team’s shooting and save percentages while that player is on the ice.
In theory, PDO measures luck. The idea is that teams can’t actually do much to change their shooting percentage and save percentage. If a team scores 6 goals on 30 shots, they got lucky and aren’t expected to maintain such a high shooting percentage.
Extreme values of PDO over short periods of time can certainly be explained by random chance, but some teams maintain moderately high or low PDO over the course of an 82-game season. The 2018-19 New York Islanders, for example, finished the season with a PDO of 1.022 and a league-leading .937 save percentage in 5v5 situations. Can this be reasonably explained by chance error, or did the Islanders benefit from elite goaltending and defensive play?
The Data
I downloaded 2018-19 statistics from Natural Stat Trick. I only considered 5v5 situations.
import pandas as pd
df = pd.read_csv('Team Season Totals - Natural Stat Trick.csv', index_col='Team')
df.head()
Some of the numbers (including PDO) are rounded to two or three decimal places. We’ll compute more precise values directly from the SF, SA, GF, and GA columns (that’s shots for, shots against, goals for, and goals against). We’ll drop each of the other columns and recalculate SH% (shooting percentage), SV% (save percentage), and PDO.
df.columns
df.drop(['Unnamed: 0', 'GP', 'TOI', 'W', 'L', 'OTL', 'ROW', 'Points', 'Point %',
'CF', 'CA', 'CF%', 'FF', 'FA', 'FF%', 'SF%',
'GF%', 'xGF', 'xGA', 'xGF%', 'SCF', 'SCA', 'SCF%', 'SCSF', 'SCSA',
'SCSF%', 'SCGF', 'SCGA', 'SCGF%', 'SCSH%', 'SCSV%', 'HDCF', 'HDCA',
'HDCF%', 'HDSF', 'HDSA', 'HDSF%', 'HDGF', 'HDGA', 'HDGF%', 'HDSH%',
'HDSV%', 'MDCF', 'MDCA', 'MDCF%', 'MDSF', 'MDSA', 'MDSF%', 'MDGF',
'MDGA', 'MDGF%', 'MDSH%', 'MDSV%', 'LDCF', 'LDCA', 'LDCF%', 'LDSF',
'LDSA', 'LDSF%', 'LDGF', 'LDGA', 'LDGF%', 'LDSH%', 'LDSV%', 'SH%',
'SV%', 'PDO'], axis=1, inplace=True)
df['SH%'] = df['GF']/df['SF']
df['SV%'] = (df['SA']-df['GA'])/df['SA']
df['PDO'] = df['SH%']+df['SV%']
Significance Testing
In order to do a statistical test on each team’s PDO, we need to know how PDO is distributed. In order to know how PDO is distributed, we need to know how SH% and SV% are distributed.
If is the probability of scoring a goal on a single shot, then shooting percentage after
shots is binomially distributed (and hence approximately normally distributed when
is large) with mean
and standard deviation
. Similarly, save percentage for
shots against will be normally distributed with mean
and standard deviation
(as long as
is large enough). I’ll estimate
by taking the league-wide average shooting percentage, and then compute SH% and SV% standard deviations for each team.
p = sum(df['GF'])/sum(df['SF'])
p
df['SH% std dev'] = (p*(1-p)/df['SF'])**0.5
df['SV% std dev'] = ((1-p)*p/df['SA'])**0.5
PDO is the sum of SH% and SV%, and the sum of two normally distributed random values is also normal. If is normal with mean
and standard deviation
and
is normal with mean
and standard deviation
, then
is normal with mean
and standard deviation
where
is the correlation between
and
.
Intuitively, shooting percentage and save percentage should be uncorrelated. In other words, should be 0. SciPy’s pearsonr function can test this.
from scipy import stats
stats.pearsonr(df['SH%'],df['SV%'])
The sample correlation between SH% and SV% is about -0.148 and the p-value is about 0.426. This means that uncorrelated quantities have a 42.6% chance of producing a sample correlation that far from 0, so it seems reasonable to assume that SH% and SV% are uncorrelated. Now we’re ready to compute PDO standard deviations and z-scores for each team.
df['PDO std dev'] = (df['SH% std dev']**2+df['SV% std dev']**2)**0.5
df['PDO z-score'] = (df['PDO']-1)/df['PDO std dev']
df.head()
For any given team, we could do a z-test to determine if there is a statistically significant difference in PDO. Each null hypothesis is that a team’s PDO will converge to 1000. Of course, the more tests we run the more likely it is that we’ll get a false-positive results. Running 31 tests (one for each team) makes false-positives very likely.
We’ll use the Holm-Bonferri method to control the probability of getting a false-positive result. We’ll sort our DataFrame by PDO p-value and create a Holm-Bonferri column with . If a team’s PDO p-value is less than the entry in the Holm-Bonferri column, we will reject the null hypothesis for that team (and every other team above it in the DataFrame).
df['PDO p-value'] = 2*stats.norm.cdf(-abs(df['PDO z-score']))
df.sort_values('PDO p-value', inplace=True)
alpha = 0.05
df['k'] = range(1,len(df)+1)
df['Holm-Bonferri'] = alpha/(len(df)+1-df['k'])
df
In each case, we do not reject the null hypothesis. This means that we can’t conclude that any individual team had a statistically significant effect on their PDO. This doesn’t mean that there is no effect, it just means this test can’t detect any effect.
The upshot of all this is that PDO really is a robust measure of luck. The Islanders’ PDO of 1.022 is only about 2.5 standard deviations away from 1.000. 2.5 is rather large in terms of standard deviation, but it’s not unusual to see in a sample of 31 teams. I still suspect that some teams can affect their PDO without getting lucky, but not by much. Sooner or later, luck will even things out.