Yum. Maybe I’m just channeling Thanksgiving — which is only a few days away, but this title just jumped into my head as I was working on today’s edition of Analytics in Action. What do predictive analytics and stuffing have in common — probably nothing, but you know the yummy goodness of stuffing with your Thanksgiving turkey, right? It just makes you all warm and fuzzy inside (not to mention sleepy because of all the tryptophan in the turkey).
Well, that’s the awesome goodness you get when doing predictive analytics.
Predictive analytics recipe
Just as everyone has their favorite stuffing recipe, every business has its own recipe for predictive analytics — what to measure. [Tweet “predictive analytics is like stuffing – each business has own recipe”]
If you want to learn more about putting together a predictive analytics recipe for your business, I’ve collected some resources for you:
- HBR’s Predictive Analytics Primer
- Information Week’s Big Data Analytics: Descriptive versus Predictive versus Prescriptive
- Forbes: How Big Data Helps Stores Like Macy’s and Kohl’s Track You Like Never Before
- Forbes: 5 Steps to Master Big Data and Predictive Analytics in 2014
That’s because businesses have different goals, different available metrics, and different consumer types. So, what do you need for your predictive analytics recipe:
Data is the bread that forms the bulk of your predictive analytics. Without good quality data, your predictive analytics recipe falls short, just like your stuffing isn’t very good without good quality bread.
To predict what customers WILL do, you need to understand what they ARE doing and why they’re doing it. I think an example helps, so here’s my 4 factor model of social media performance:
Social media performance = amplification X sentiment X marketing intensity X close rate [Tweet “SM performance = amplification X sentiment X marketing intensity X close rate”]
Using this model, a firm gathers metrics for each factor:
- social media performance (achieving conversion goals like clicks, downloads, purchases)
- amplification (engagement such as shares, likes, RT … results in amplification)
- the sentiment (how do customers and prospects “feel” about your brand — usually in terms of positive, negative, and neutral emotions)
- marketing intensity (assesses your digital marketing efforts ie. # posts, $ on digital advertising)
- close rate (which is pretty self-explanatory – what % of visitors to your site buy (or subscribe, depending on your goals), what percentage of email readers click on a link, etc.
Notice some metrics are actually computed from a variety of sources, rather than a single metric from a report. For instance, to calculate amplification, you’ll need to add up engagement on whatever social media platforms you use plus the shares directly from your website.
We commonly refer to these metrics as descriptive statistics because they describe what occurred. While descriptive statistics are useful, they’re just not enough to optimize your business strategy. Hence, collecting descriptive statistics related to amplification (engagement across social networks) really helps show which times of day and days of the week perform best for creating engagement — especially when you use systematic testing as input for these descriptive stats — they’re not as powerful as using metrics as input for predictive analytics.
Sure, you can build a predictive model directly from the data, especially when you have so-called “big data” through a process of data mining, these models often perform poorly — for more on this stay tuned for my next post.
Data mining is a process of data reduction based on correlations. If you remember your college stat class, you’ll recall that correlation is NOT causation — meaning that just because 2 things change together, they aren’t necessarily related. A perfect example of this is the old adage about the correlation between the economy and hem lengths — a robust economy tends to feature shorter hem lengths, while a poor economy features a longer skirt.
If there were a direct correlation between hem lengths and the economy, we could fix the economy by requiring shorter skirts. Unfortunately, hem lengths don’t cause changes in the economy nor does the economy affect hem lengths. Instead, the popular notion is that both trends result from optimism, which isn’t part of the model.
Another problem with data mining is spurious correlations — relationships that don’t have any meaning. Here’s a hilarious example of a spurious correlation (see more here):
Instead, we need to develop inferences based on our understanding of consumer behavior. For instance, my 4-factor model is based on a similar model used to evaluate salesperson performance that’s been tested many times. Since salesperson performance is somewhat like social media performance, it’s a valuable inference. Testing will show how well the 4 factor model actually predicts social media performance.
For instance, Vera Wang uses a predictive model to send more highly targeted emails resulting in 63% fewer emails, 101% higher CTR (click-through rates), and 275% higher conversion rates. Not bad inferences.
Next, you take your inferences and build statistical models — usually employing regression analysis (or related techniques like logistical regression — logit). Statistical models turn inferences in useful insights to guide strategic planning. Basically, regression uses historical data to assign beta weights to the individual factors (variables) in the model. These weights not only let you know which variables have the greatest impact on your outcome variable (goal), but allow you to predict goal achievement from various strategy scenarios.
Going back to the Vera Wang example, they likely looked at goal achievement (CTR and conversion) across recent emails sent to their email list. They predicted which types of list members responded to which types of messaging. Then, they customized their messaging sending each member the type of message most likely to motivate them to respond positively.
Car dealers use predictive analytics in the same way. They know how long owners of particular cars keep their car. Using this data, they purchase vehicle registration data for owners who are nearing the time when they’ll trade in their existing car for a new one. By selectively reaching car owners likely shopping a new car, they reduce costs and increase returns from their marketing efforts.
A key element in your predictive analytics recipe is sharing insights. Statistics do NO good if they’re stuck with data analysts rather than reaching decision-makers, which is where predictive analytics turn into profits. Otherwise, your predictive analytics are just novelties.
Changes to 2014 predictive analytics
If I were writing this post a few years ago, I’d finish without adding this section. Predictive analytics used to depend entirely on quantitative data — numbers. Now, with the explosion of social media, we need to think about how qualitative data — words — predict the future of a company. Even my 4-factor model contains a qualitative component — sentiment. The sentiment comes from coding utterances from forums, social networks, even customer service calls to score the overall sentiment related to your brand.
But, you can drive even better predictions without coding qualitative data — using tools like nVivo and Hyperresearch to create insights directly from qualitative data. For instance, I might use something like cluster analysis to create groups of customers based on similar utterances. Groups who buy my product for gifts or special occasions require different marketing tactics than consumer groups who buy my product as a routine part of their weekly shopping trip. And, they represent a distinctly different CLV for my business.