Successfully predicting which product innovation may become the next big hit is impressive. Entrepreneurs that can see in their Crystal Balls the future succes, become rich and end up on the cover of Time magazine. The underlying assumption is that managers and entrepreneurs who predicted that a new product would be successful, have better intuitive judgment and a superior evaluation of the situation at hand. Is this really the case or does data science tell us otherwise?



There is a simple reason why intuition may be wrong. Rather than being an indication of good judgment, accurately forecasting a rare event such as new hit may be an indication of poor judgment. To put it differently: poor forecasters are more likely to make extreme predictions. Because extreme outcomes are scarce, managers who take into account all the available information are less likely to make such extreme predictions. Manager on the other hand who rely on heuristics and intuition are more likely to make extreme predictions.1

To illustrate how this effect works we turn to the Wall Street Journal. Every six months, the Wall Street Journal asks about 50 economists and analysts to forecast a set of macroeconomic statistics for the next six months. Dr. Sung Won Sohn, CEO of Hamni Financial Group, achieved first place by being one of a few who correctly predicted a high inflation rate when the consensus forecast was low. Dr. Sohn credited his unusually high but accurate inflation forecast to an intuition he developed after visiting a California jeans producer. The producer could not keep up with demands for its $250 jeans. According to the Wall Street Journal, “He figured ‘there must be money out there if people are willing to pay that much’ for blue jeans.” Such methods do not always work; in the preceding two surveys, Dr.Sohn was ranked 43 and 49 out of 55.


Implications for innovation

The fallacy of prediction accuracy may have profound implications for innovation. Consider the discussions of the failures of incumbent firms to predict and react to new “disruptive” technologies. Few emerging technologies or business models are disruptive and are not easy to detect.2

Because the base rate, the frequency of disruptive innovations, is low, rational forecasters will rarely bet that new technology is disruptive. Irrational forecasters, who ignore the base rate and evaluate signals differently, are more likely to make such calls.

This line of reasoning suggests that the failure to predict what technologies will become disruptive is not necessarily a sign of poor judgment, flawed mental models, or inertia. Instead, it may be an indication of sound judgment. In such situations, rational individuals will appear as inert and non-responsive, whereas irrational individuals will appear agile and responsive.

A prediction backed by data science

Rather than relying on human intuition, data may provide a means to help make more sound decisions and alleviate over reaction to weak signals. To illustrate how data science may help, let’s try to predict the next big music hit.

We gathered a dataset of 3000 thousand songs. Some of these songs have made it to the Billboard Top 100, some of them did not. Why? We took a closer look at the properties of a song itself and the artists, to see if they might help us in predicting what will be the next hit on the Billboard Top 100. We used the properties of a song as provided by Spotify.

How to predict a hit with data science?

With a couple of models and analysis, we tested which variables are important to predict a hit song. The methods provided us with a robust way of assessing the importance of each variable in our model. The plot below denotes the importance of each variable. Also, we tested whether a variable had a significant and relevant contribution or not.3

How to predict a hit with data science? - importance graph

It might seem obvious and self-evident but having scored a previous Billboard hit increases the probability of scoring another hit (the artist score). Succes creates succes! Fortunately, previous success is not all there is. For it would limit the chance for any new musician or band to become successful in the future. Instrumentalness, danceability, acousticness and loudness are also great song properties to explain future success. Maybe opposite of what you might think, the key and mode of a song have no significant predictive value.

The properties are not static. Amongst the top four, we saw for example that more instrumentalness decreased significantly the change of scoring a hit. Whereas more danceability made the charts explode;-)

How to predict a hit with data science? - probability graph

How did we do?

To see if our model performs well, we used the test data to compare our predictions from the model with actual results. Using these statistics, we can calculate how accurate our model is. Our so called F1-score4) is 0.81. Which is much better than random chance ( 50 / 50 ). Obviously a 0.98 score would have been even better. But predicting a hit is never easy, not even for data science. We are still working on it;-)

Our analysis does however provide insights into which factors are important, whereby reducing some of the uncertainty. It even provides guidelines which may be for musicians (or machines) used to compose great hits. Or in management language, better decisions.

Take Away

Innovation has a lot to do with making decisions under uncertainty. While some guru’s and innovators may appear to have an unusual good business sense, it is more likely that over the long run this is not sustainable. Very few people get it right, every time.

Taking a data driven approach can help determine which signals are important and which are not in a systematic way. Forecasters who often make bold predictions on intuition, are likely to be those that overreact to weak signals. Next time, don’t only listen to self-righteous innovation guru’s but apply data science to reduce uncertainty and make bets which are based on data instead of the loudest voice in the room.


Denrell, Jerker, and Christina Fang. 2010. “Predicting the Next Big Thing: Success as a Signal of Poor Judgment.” Management Science 56 (10). INFORMS: 1653–67.

Kursa, Miron, and Witold Rudnicki. 2010. “Feature Selection with the Boruta Package.” Journal of Statistical Software, Articles 36 (11): 1–13.

Sood, A, and G J Tellis. 2011. “Demystifying disruption: A new model for understanding and predicting disruptive technologies.” Marketing Science.

  1. Please see (Denrell and Fang 2010) for a complete account of this phenomenon↩
  2. Please see (Sood and Tellis 2011) on disruptive technologies↩
  3. We use a method called Random Forest Analysis (RA) to predict a hit song. To understand which variables are important and which are not for the model we use the Boruta method (Kursa and Rudnicki 2010), we tested which variables are important to predict a hit song. The methods provided us with a robust way of assessing the importance of each variable in our model.↩