Inspired by FiveThirtyEight's CARMELO, a tool for projecting the careers of NBA players, I set out to see if I could reproduce something for NHL forwards.
Toiling away trying to work this in R has proven to be quite a project. I was expecting it to be one, but now, nearly a month into it (when I can put the time into it), I'm still nowhere close to being complete. I'm going to provide my sample below for Chris Kreider. This will help display just how far away from being done that I am, as I had to stop halfway through the analysis in R, pull the data into Excel, clean it there, then go back into R to produce the chart.
As the first step in this process is to evaluate player similarities [tools like this exist on War-On-Ice, and the new hockey advanced stats Corsica], my system for finding similar players was as follows:
- Player needs to be within +/- 1 year of age of the evaluated players most recent season
- Player must have appeared in at least 50% of games to be included in the evaluation
- The metrics I chose to use to evaluate similar players are as follows:
- Individual shot attempts per 60
- Shot attempts by player divided by total shot attempts for while on ice (Percent Player Shots taken)
- Unblocked shot attempt success (Individual Fenwick divided by Individual Corsi)
- Scoring chances for relative to teammates
- Time on ice per game
- Teammate corsi for per 60
- competition corsi for per 60
- Primary assists
The thought process for choosing the above metrics was to find players who had similar tendencies, usage, and slash stats to draw comparables. Each metric is weighted to a certain extent (ie. Goals are more important than competition corsi for per 60).
Let's get to the meat of it...
All statistics are 5v5 since 07-08, provided by War-On-Ice
My top ten similar players to Chris Kreider's 2014-2015 season ended up as:
- Thomas Vanek
- Mason Raymond
- Jamie McGinn
- Zach Parise
- Andrei Kostitsyn
- Blake Wheeler
- Andrew Ladd
- Niclas Bergfors
- Jordan Staal
- Ondrej Palat
Now, honestly, and I know I'm biased, I don't hate these comparisons at all. What I'm not in love with, that will follow, are the stats that get returned when I try to map out the player's career. There are 213 players with a positive similarity score to Chris Kreider's 2014-2015 season, from Vanek #1, to Kyle Clifford #213.
When I go ahead and take the weighted average of these players' goal totals, this is the chart I get for Kreider:
Note: The error bar on the below is simply +/- two goals, as I have yet to calculate 90% confidence levels. Again, limited where it stands now.
Now, at first glance, Kreider never scoing more than 14 5v5 goals, at least until age 30, is the most disheartening thing of all-time. And as much as I hate the return numbers I'm currently getting, well, it makes sense considering the data I am working with.
Last season, 57 NHL players scored more than 14 goals. Is Kreider a top 60 5v5 even strength goal scorer in the NHL? Well, I don't know. It does explain why the numbers I'm getting never seem to go that high, because there is a lot of player depth in the NHL messing with the system, keeping players "grounded" for lack of a better term.
Anyway, again, this blog serves as an update as to where I stand on this. The goal is to, obviously, make this fully workable (and reproducible) in R. It also serves as a way to get this to some other eyes. Any and all ideas here would be wildly appreciated.
And hopefully, a GM targeting Kreider this deadline will look at this post, and decide that maybe it's not such a good idea.