Using Machine Learning to Predict National Hockey League Average Home Game Attendance

Main Article Content

Barry E King
Jennifer L Rice
Julie Vaughan

Abstract

Research predicting National Hockey League average attendance is presented. The seasons examined are the 2013 hockey season through the beginning of the 2017 hockey season. Multiple linear regression and three machine learning algorithms – random forest, M5 prime, and extreme gradient boosting – are employed to predict out-of-sample average home game attendance. Extreme gradient boosting generated the lowest out-of-sample root mean square error.  The team identifier (team name), the number of Twitter followers (a surrogate for team popularity), median ticket price, and arena capacity have appeared as the top four predictor variables. 

Article Details

Section
Articles
Author Biographies

Barry E King, Butler University

Associate Professor of Information Systems and Operations Management

Jennifer L Rice, Butler University

Assitant Professor of Economics

Julie Vaughan, Butler University

Research Assistant College of Pharmacy and Health Sciences

References

Borland, J. and MacDonald, R., ‘Demand for sport’ (2003) Oxford Review of Economic Policy, 19(4), 478-502.

Boyd, T., and Krehbiel, T., ‘The effect of promotion timing on major league baseball attendance; (1999) Sport Marketing Quarterly, 8(4), 23-34.

Breiman, L., Machine Learning, (2001). 45: 5. doi.org/10.1023/A:1010933404324.

Deshpande, S., and Jensen, S., ‘Estimating an NBA player’s impact on his team’s chances of winning’ (2016). Journal of Quantitative Analysis in Sports, 12(2), 51-72.

Douvis, J., ‘What makes fans attend professional sporting events? A review’ (2014) Advances in Sport Management Research Journal, vol. 1 pp. 40-70.

ESPN, (n.d.). Retrieved December 7, 2017, from www.espn/nhl/attendance.

Gitter, S., and Rhoads, T., ‘Determinants of minor league baseball attendance’ (2010) Journal of Sports Economics, 11(6), 614-628.

Gladden, J., and Funk, D., ‘Understanding brand loyalty in professional sport: examining the link between brand associations and brand loyalty’ (2001) International Journal of Sports Marketing and Sponsorship, 3(1), 54-81.

The Hockey News, (n.d.). Retrieved December 7, 2017 from www.thehockeynews.com/news/article/ranking-the-nhls-fan-base-from-1-to-30.

Jewell, R. and Molina, D., ‘An evaluation of the relationship between Hispanics and major league soccer’ (2005) Journal of Sports Economics, 6(2), 160-177.

Jane, W.,’ The effect of star quality on attendance demand’ (2016) Journal of Sports Economics, 17(4), 396-417.

Kakoty, S., (n.d.). What is the simple explanation of M5P (M5 model trees) algorithm in machine learning/data mining? Retrieved December 21, 2017 from www.quora.com/What-is-the-simple-explanation-of-M5P-M5-model-trees-algorithm-Machine-Learning-Data-Mining.

Liu, J., (n.d.). Updated: xgboost with parameter tuning. Retrieved December 21, 2017 from www.kaggle.com/jashenliu/updated-xgboost-with-paramter-tuning.

Mongeon, K., Winfree, and J., ‘Comparison of television and gate demand in the National Basketball Association’ (2012) Sport Management Review, 15(1), 72-79.

Nishad, (n.d.). What do we mean by node impurity ref-random forest? Retrieved December 21, 2017 from stats.stackexchange/questions/223109/what-do-we-mean-by-node-impurity-ref-random-forest.

Nishida, K. (2017) Retrieved from https://blog.exploratory.io/introduction-to-extreme-gradient-boosting-in-exploratory-7bbec554ac7.

Paul, R., and Weinbach, A., Determinants of attendance in the Quebec major junior hockey league: role of winning, scoring, and fighting’ (2011) Atlantic Economics Journal, 39(3) pp. 303-311.

Peters, D., (1999). Winning percentage and attendance in the NHL. (Unpublished undergraduate project). St. John Fisher College, Rochester, NY.

Polamuri, S., (n.d.). How the random forest algorithm works in machine learning. Retrieved December 21, 2017 from dataaspirant.com/2017/random-forest-algorithm-machine-learning.

Quinlan, J., ‘Learning with continuous classes’ (1992) Proceedings AI’92 (Adams & Sterling, eds.), 343-348, World Scientific, Singapore.

Raut, S., (n.d.). Want to know how to choose machine learning algorithm? Retrieved December 21, 2017 from www.datascience central.com/profilesblogs/want-to-know-how-to-choose-machine-learning-algorithm.

Statshockey.net, (n.d.). Retrieved December 7, 2017, from statshockey.homestead.com/info/nhlcities.html.

Statshocky.net, (n.d.). Retrieved December 7, 2017 from statshockey.homestead.com/info/nhlarenas.html.

Trail, G., Anderson, D., and Lee, D., ‘A longitudinal study of team-fan role identity on self-reported attendance behavior and future intentions’ (2017) Journal of Amateur Sport, 3(1) pp. 27-49.

Trawiński, B., Smętek, M., Telec, Z., and Lasota, T., ‘Nonparametric statistical analysis for multiple comparison of machine learning regression algorithms’ (2012) International Journal of Applied Mathematics and Computer Science, 22(4), pp. 867-881. Retrieved 20 Dec. 2017, from doi:10.2478/v10006-012-0064-z.

VividSeats, (n.d.). Retrieved December 7, 2017 from

www.vividseats.com/blog/nhl-tickets-prices.

Welling, S.H., (2015) Retrieved from https://stats.stackexchange.com/questions/162465/in-a-random-forest-is-larger-incmse-better-or-worse?utm_medium=organic&utm_source=google_rich_qa&

utm_campaign=google_rich_qa.

Wiedecke, J., (1999). Factors affecting attendance in the National Hockey League: a multiple regression model. (Unpublished master’s thesis). University of North Carolina, Chapel Hill.

Wikipedia (n.d.) Retrieved from https://en.wikipedia.org/wiki/Random_forest.

Zhu, N., and Chen, T., (2016). XGBoost: implementing the winningest Kaggle algorithm in Spark and Flink. Retrieved December 13, 2017 from www.kdnuggets.com/2016/03/xgboost-implementing-winningest-kaggle-algorithm-spark-flink.html.