Projects
Predicting NHL Goal Probabilities with Machine Learning
Project Overview
-
Title: Predicting NHL Goal Probabilities with Machine Learning
-
One-liner: Using NHL shot data from MoneyPuck.com to model and predict goal probabilities.
-
Short Description:
-
Exploratory analysis of NHL shot data to uncover patterns and scoring trends.
-
Development of logistic regression models with feature engineering.
-
Ongoing development of nonlinear models to improve predictive accuracy.
-
Highlights end-to-end ML workflows, data visualization, and reproducibility.
-
Skills and Techniques Used
-
Python, pandas, NumPy, scikit-learn, matplotlib
-
Exploratory Data Analysis (EDA)
-
Logistic regression and feature engineering
-
Model evaluation metrics
-
Data wrangling and cleaning
-
Visualization of complex datasets
Key Findings
-
Built a base logistic regression model using shot location, type, and context features. Achieved AUC 0.72 and Brier score 0.062.
-
Compared model predictions to MoneyPuck.com’s xG values; found systematic underestimation but identified clear patterns to improve the model.
-
Added additional features through feature engineering (shooter, goalie, team, and game situation info). This improved performance: AUC 0.785, Brier score 0.059.
-
Visualizations show the updated model better aligns with expected goal probabilities, centering prediction errors around zero.


Prediction Error Distributions: Base Model (Left) vs Combined Model (Right)
Project Status
-
Started nonlinear modeling
-
Applying Random Forest and XGBoost to capture more complex patterns in gameplay.
View the project on GitHub: