Main Article Content

Samrat Gupta
Saurabh Kumar
Pradeep Kumar


The Indian motion picture industry has experienced phenomenal growth during the last few decades and plays an important role in emerging economy of India. This paper integrates three analytical models in order to address the intriguing problem of revenue prediction of movies in Indian film industry. The paper attempts to investigate the determinants leading to the success of indigenous movies in Indian context. Ensemble model has been constructed by integrating the three analytical models (Neural Network, Classification and Regression Tree and Robust Regression) using linear optimization approach. Further, a four-way comparative analysis of these three models along with Ensemble model has been carried out. The predictive power of the models has been evaluated using four performance metrics namely root mean squared error (RMSE), mean absolute error (MAE), mean absolute percentage error (MAPE) and large prediction error (LPE). Analyzing novel and original data of 120 Indian movies released during the period August’06-October’15, this paper inspects the nitty-gritties of Indian film industry and seeks to explain the nuances. The study revealed that factors like hype generated on web by a movie, screens on which the movie is released, rating garnered by movie and its genre are the most influential variables in deciding the box-office performance of a movie. Further we observed, that the neural network model closely competes with ensemble model in terms of predictive accuracy. The ensemble model considerably reduces the predictive errors and yields better results on two of the performance metrics.

Article Details

Author Biographies

Samrat Gupta, Indian Institute of Management Lucknow

Doctoral StudentIndian Institute of Management Lucknow

Saurabh Kumar, Indian Institute of Management Lucknow

Doctoral StudentIndian Institute of Management Lucknow

Pradeep Kumar, Indian Institute of Management Lucknow

ProfessorIndian Institute of Management Lucknow


A. De Vany and W. D. Walls ‘Uncertainty in the movie industry: Does star power reduce the terror of the box office?’ (1999) 23(4) Journal of Cultural Economics 285-318

A. Elberse ‘The power of stars: Do star actors drive the success of movies?’ (2007) 71(4) Journal of Marketing 102-120

A. Elberse and J. Eliashberg ‘The drivers of motion picture performance: the need to consider dynamics, endogeneity and simultaneity’ (2002) In Proceedings of the Business and Economic Scholars Workshop in Motion picture Industry Studies. Florida Atlantic University 1-15

A. Ishii, H. Arakaki, N. Matsuda, S. Umemura, T. Urushidani, N. Yamagata and N. Yoshida ‘The ‘hit’phenomenon: a mathematical model of human dynamics interactions as a stochastic process’ (2012) 14(6) New journal of physics

A. Lemmens and C. Croux ‘Bagging and boosting classification trees to predict churn’ (2006) 43(2) Journal of Marketing Research 276–286

A. Oghina, M. Breuss, M. Tsagkias and M. de Rijke ‘Predicting imdb movie ratings using social media’ (2012) In Advances in information retrieval 503-507

B. H. Chang and E. J. Ki ‘Devising a practical model for predicting theatrical movie success: Focusing on the experience good property’ (2005) 18(4) Journal of Media Economics 247-269

B. R. Litman ‘Predicting success of theatrical movies: An empirical study’ (1983) 16(4) The Journal of Popular Culture 159-175

B. R. Litman and L. S. Kohl ‘Predicting financial success of motion pictures: The'80s experience’ (1989) 2(2) Journal of Media Economics 35-50

D. Delen and R. Sharda ‘Predicting the financial success of hollywood movies using an information fusion approach’ (2010) 21(1) Indus Eng J 30–7

D. G. Garson ‘Interpreting neural network connection weights’ (1991)

D. Pathak, D. Rothschild and M. Dudik ‘A comparison of forecasting methods: fundamentals, polling, prediction markets, and experts’ (2015) 9(2) The Journal of Prediction Markets 1-31

E. Bauer and R. Kohavi ‘An empirical comparison of voting classification algorithms: Bagging, boosting, and variants’ (1999) 36(1-2) Machine Learning 105–139

G. Delmestri, F. Montanari and A. Usai ‘Reputation and strength of ties in predicting commercial success and artistic merit of independents in the Italian feature film industry’ (2005) 42(5) Journal of Management Studies 975-1002

G. Kulkarni, P. K. Kannan and W. Moe ‘Using online search data to forecast new product sales’ (2012) 52(3) Decision Support Systems 604-611

G. Shmueli and O. Koppius ‘Predictive analytics in information systems research’ (2010) Robert H. Smith School Research Paper No. RHS 06-138

H. A. Chipman, E. I. George and R. E. McCulloch ‘Bayesian CART Model Search’ (1998) Journal of the American Statistical Association

H. Liu, F. Hussain, C. L.Tan and M. Dash ‘Discretization: An enabling technique’ (2002) 6(4) Data mining and knowledge discovery 393-423


J. D. Olden and D. A. Jackson ‘Illuminating the “black box”: a randomization approach for understanding variable contributions in artificial neural networks’ (2002) 154(1) Ecological modelling 135-150

J. D. Olden, M. K. Joy and R. G. Death ‘An accurate comparison of methods for quantifying variable importance in artificial neural networks using simulated data’ (2004) 178(3) Ecological Modelling 389-397

J. Dougherty, R. Kohavi and M. Sahami ‘Supervised and unsupervised discretization of continuous features’ (1995) 12 In Machine learning: proceedings of the twelfth international conference 194-202

J. Eliashberg, A. Elberse, and M. A. Leenders ‘The motion picture industry: Critical issues in practice, current research, and new research directions’ (2006) 25(6) Marketing Science 638-661

J. Han, M. Kamber and J. Pei Data mining: concepts and techniques: concepts and techniques (Elsevier, 2011)

J. Wyatt High concept: Movies and marketing in Hollywood (University of Texas Press, 2010)

K. Coussement and D. Van den Poel ‘Churn prediction in subscription services: An application of support vector machines while comparing two parameter-selection techniques’ (2008) 34(1) Expert Systems with Applications 313–327

K. Coussement and K. W. De Bock ‘Customer churn prediction in the online gambling industry: The beneficial effect of ensemble learning’ (2013) 66(9) Journal of Business Research 1629–1636

K. S. Lorek and G. L. Willinger ‘A multivariate time-series prediction model for cash-flow data’ (1996) Accounting Review 81-102

L. Breiman, J. Friedman, C. J. Stone and R. A. Olshen Classification and regression trees (CRC press, 1984)

L. I. Kuncheva and J. J. Rodriguez ‘Classifier ensembles with a random linear oracle’ (2007) 19(4) IEEE Transactions on Knowledge and Data Engineering 500–508

L. Zhang, J. Luo and S. Yang ‘Forecasting box office revenue of movies with BP neural network’ (2009) 36(3) Expert Systems with Applications 6580-6587

M. Ghiassi, D. Lio and B. Moon ‘Pre-production forecasting of movie revenues with a dynamic artificial neural network’ (2015) 42(6) Expert Systems with Applications 3176–3193

M. A. Razi and K. Athappilly ‘A comparative predictive analysis of neural networks (NNs), nonlinear regression and classification and regression tree (CART) models’ (2005) 29(1) Expert Systems with Applications 65-74

M. Joshi, D. Das, K. Gimpel and N. A. Smith ‘Movie reviews and revenues: An experiment in text regression’ (2010) In Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics 293-296

M. S. Sawhney and J. Eliashberg ‘A parsimonious model for forecasting gross box-office revenues of motion pictures’ (1996) 15(2) Marketing Science 113-131

M. van Wezel and R. Potharst ‘Improved customer choice predictions using ensemble methods’ (2007) 181(1) European Journal of Operational Research 436–452

M. W. Beck, B.N. Wilson, B. Vondracek and L. K. Hatch ‘Application of neural networks to quantify the utility of indices of biotic integrity for biological monitoring’ (2014) 45 Ecological Indicators 195-208

N. Terry, M. Butler and D. De’Armond ‘The determinants of domestic box office performance in the motion picture industry’ (2005) 32(1) Southwestern Economic Review 137-148

P. F. Skilton ‘Knowledge based resources, property based resources and supplier bargaining power in Hollywood motion picture projects’ (2009) 62(8) Journal of Business Research 834–840

P. J. Huber Robust statistics (Springer, 2011)

R. Dash, R. L. Paramguru and R. Dash ‘Comparative analysis of supervised and unsupervised discretization techniques’ (2011) 2(3) International Journal of Advances in Science and Technology 29-37

R. Kohavi ‘A study of cross-validation and bootstrap for accuracy estimation and model selection’ (1995) 14(2) In Ijcai 1137-1145

R. Neelamegham and P. Chintagunta ‘A Bayesian model to forecast new product performance in domestic and international markets’ (1999) 18(2) Marketing Science 115-136

R. O. Duda, P. E. Hart and D. G. Stork Pattern classification (New York, 2nd edn, 2001)

R. R. Wilcox Fundamentals of Modern Statistical Methods Substantially Improving Power and Accuracy (Springer, 2nd edn, 2010)

R. Sharda and D. Delen ‘Predicting box-office success of motion pictures with neural networks’ (2006) 30(2) Expert Systems with Applications 243-254

S. A. Ravid ‘Information, Blockbusters, and Stars: A Study of the Film Industry’ (1999) 72(4) The Journal of Business 463-492

S. Goel, J. M. Hofman, S. Lahaie, D. M. Pennock and D. J. Watts ‘Predicting consumer behavior with Web search’ (2010) 107(41) In Proceedings of the National academy of sciences 17486-17490

S. Sochay ‘Predicting the performance of motion pictures’ (1994) 7(4) Journal of Media Economics 1-20

S. Sreenivasan ‘Quantitative analysis of the evolution of novelty in cinema through crowdsourced keywords’ (2013) 3 Scientific reports

T. Ganti Bollywood: a guidebook to popular Hindi cinema (Routledge, 2013)

V.J. Yohai ‘High breakdown-point and high efficiency robust estimates for regression’ (1987) The Annals of Statistics 642-656

W. T. Wallace, A. Seigerman and M.B. Holbrook ‘The role of actors and actresses in the success of films: How much is a movie star worth?’ (1993) 17(1) Journal of cultural economics 1-27

W. Y. Loh ‘Classification and regression trees’ (2011) 1(1) Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 14-23

W. Zhang and S. Skiena ‘Improving movie gross prediction through news analysis’ (2009) 1 In Proceedings of the 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology 301-304

W. Zhang, Q. Cao and M.J. Schniederjans ‘Neural network earnings per share forecasting models: a comparative analysis of alternative methods’ (2004) 35(2) Decision Sciences 205-237