Well, this was disappointing. The same model showed some pretty good success in predicting the 15-16 season, but totally botched the 16-17 season.
Running the model to the 15-16 season, predicted goals had an r^2 of 0.3037 and predicted primary points had an r^2 of 0.3421. In this iteration, 80 forwards had their goals predicted correctly, 77 forwards had their primary points predicted correctly, and 36 players had both their goals and primary points predicted correctly.
For this season though, well, the model essentially failed. With the same weighting parameters and system, this season didn't seem to cooperate. Goals had an r^2 of 0.1982, and primary points had an r^2 of 0.2952. Of the 298 players evaluated, the model hit on just 51 players for goals, 66 for primary points, and a measly 25 for both.
Issues with the model:
I think it's too lenient. The way the predictions work, if you can recall, is similar to 538's version of CARMELO. The issue, though, is that I don't think I'm getting enough separation to make each player unique. Each player, with the current weighting system, is coming away with too many comparable players. Throw all of these guys into the mix, and the model will end up projecting very close to league average, even for the league's better players.
For example, when running the projections for Sidney Crosby, 87 comes away with 144 player seasons with a positive similarity score. That's too many for Sid, who even I can admit, is in a league of his own. Now, a good sign is that no one has a similarity score over 57 (highest is Datsyuk's aged 29 season, a 97 point campaign for PD), but 144 players, even indexed based on their similarity score, is going to bring Crosby's projections down, and it does, predicting a high of 13.7 5v5 goals for Sid this year.
To counter this, I think I'm going to need to add more constraints to how the similarity scores are calculated.
But still, I think the power of this model comes in the way we can view a player's career trends, and not necessarily the actual numbers it produces. In the sense of, which players career goal and point totals are trending up based on similar players, and which players career goal and point totals are trending down based on similar players. Especially for upcoming free agents.
For example, a team may find themselves in the market for a UFA winger this year like Vanek or Stafford. Using these projections, they could get a baseline impression of how each players career will progress based on the previous careers of similar players...
(The first line in the below images is still the 16-17 season).
It's no surprise that the model is more optimistic in Vanek's future point totals, as Vanek has always been the more offensive player of the two, but, it is interesting that the model perceives a huge dropoff when Stafford hits age 34 that it doesn't have for Vanek. As always with signing players over 30 to long-term contracts, buyer-beware.
Continue to tinker. The model was only decent for the 15-16 season and was truly horrid for the 16-17 season. This probably means making the parameters more strict in order to get fewer and more accurate player projections. Which, I think means, adding more parameters.
I have plans on launching a Shiny app for the 17-18 season, but haven't gotten around to that yet.