For this project, I am trying to decide which actor/actress to cast in a hypothetical film that will most help that film to make money. Specifically, is there a better way to choose actors than based solely on their previous box office totals? In previous studies, it has been found that an actor or actress’ star power is a predicative factor in a film’s box office. But can we quantify “star power” in a different way than past box office performance?
In the age of the Internet, we can track users’ interest in a given topic in more ways than before. Previous studies have tried to predict box office success using twitter or web search. For celebrities, these methods could help us identify “rising stars” much more quickly or more accurately gauge which stars in a given movie are driving that film’s performance. By previous box office, Samuel L. Jackson is the most bankable star. However, on Wikipedia, I found that Jennifer Lawerence reigns.
I felt Wikipedia was a good choice for this kind of study because it is the largest reference site on the internet and the seventh overall in Alexa rankings. It is the go-to site for information. Also, they make their page view information freely available.
I took all films from 2012-2013 except animated films and documentaries taken from the list of US films on boxofficemojo.com. I then took the list of actors in that film (again, according to boxofficemojo) and cross referenced them with Wikipedia. Additionally, I have taken page view statistics of every actor in one of the said movies (as listed on boxofficemojo) for the years 2013 and 2013 taken from http://stats.grok.se/.
I tried several predictive models using scikit-learn, the one that worked best was a linear regression using average page views for the 30 days leading up to a film’s release, budget and rating. The prediction using the Wikipedia page views performed slightly worse than the previous box office totals with r-squareds 0.614 and 0.64 respectively. However, the page views are significant and future work could be done to improve the model.
My code for this project can be found here