Are you tired of hearing about machine learning and “the artificial intelligence revolution”? Or perhaps just dismissive of the topic, thinking it’s not relevant outside of the realm of Amazon and Google or a team of data scientists working for a Fortune 500 company? Well, it’s time to change your mindset and recognize that machine learning is another powerful tool that business analysts can add to their “bag of tricks” as they look for new ways to deliver value to their organization.
With the proliferation of open source tools, as well commercially available applications that hide most of the complexity of building a prediction model, it’s getting easier and easier to use machine learning to help solve all sorts of business problems. The applications are endless. To preserve the confidentiality of proprietary information, I can’t use any real-life example from my consulting practice. Instead, I’ll illustrate the possibilities using a simulated project based on a real dataset released as a companion for the paper Explaining the Product Range Effect in Purchase Data by Pennacchioli, D., Coscia, M., Rinzivillo, S., Pedreschi, D. and Giannotti, F., . In BigData, 2013. (If you’d like to play with the same data, you can find it here.)
The dataset contains purchase data aggregated by customer from January 2007 to December 2011 for five supermarkets owned by Coop, one of the largest Italian retail distribution company. It includes variables such as the distance between the customer’s house and the closest and farthest store locations (which may carry different items), the average unit prices of the products purchased by the customer, the total amount of items purchased, and the price paid for each purchase during the time span covered.
Simulated Case Study
A multi-unit enterprise is interested in expanding its line of products to attract more “high value customers” in a specific geographic area where there it has five stores. The company uses a loyalty card to track the patterns of consumption of each registered customer and classify as “high value” the ones who spend $5,000 or more per year across the five stores, assigning them a “VIP” status.
A business analyst on the IT team is assigned to write the requirements for a solution that will allow the company to create and execute a survey to collect the preferences from the “VIP” customers of the five stores. These preferences will inform the decision of which new items to carry in the stores to increase customer loyalty and share of wallet among those top customers. As the BA gathers information from the stakeholders and internal databases to inform the requirements for the project, she learns that the company has 60,365 registered customers with a label of “VIP” or “Regular” assigned to them. Out of these 60,365 labeled customers, 18,278 have “VIP” status. An additional set of 8,103 customers are still unlabeled, as their use of the loyalty card is more recent. This set includes both brand new customers and existing customers convinced by a recent marketing campaign to register to the loyalty program for which the company doesn’t have yet enough purchase history collected for label assignment based on the existing rule.
The BA decides to see if it’s possible to find any reliable patterns to identify “VIPs” among these 8,103 customers with a shorter purchase history in order to expand the number of people invited to take the customer survey, so more data is available to generate insights and the preferences of top customers just adhering to the loyalty program can be taken into account. After extracting the records of the 60,365 labeled customers from the loyalty system, she uses R Studio (an open source tool that makes it easier to use the open source statistical language R) to make sense of the data.
She does some variable selection and transformation, and splits the resulting dataset into two sets, one for “training” the model (42,255 records, 70% of the available data), and another for testing the model accuracy with “unseen data”(18,109 records). She then uses a supervised learning algorithm to analyze the training data and produce an inferred function that can be used for mapping new examples of “VIP” customers.
To find out if she trained a good model, the BA looks at statistics provided by R including Accuracy (90%) and Kappa (77%), and checks the model against the training and test datasets. The following table shows how well the model performed with new observations (the test set with labeled customers not used to train the model). The correct predictions are the ones in the diagonal line from top-left to bottom-right. Our test set has 12,619 “Regular” customers, with 11,729 (92%) being correctly labeled by the model. The test set also has 5,490 “VIP” customers, with 4,624 (84%) classified as such. The error (misclassification rate) is 9.7%, consistent with the performance achieved by the model with the training data.
Additional steps could have been taken to improve the model: test a different algorithm (extremely easy to do using the R package caret), find a better combination of predictors to feed into the model, and so forth. But since the business is not dealing with a problem in which false positives or false negatives would cause any harm, the BA decides that the model is good enough for the intended purpose. She can now apply the model to predict the status of customers in the unlabeled set using predictors like average purchase cost, and flag the ones assigned the “VIP” class to participate on the survey.
The figure below shows a partial depiction of the resulting decision tree used to classify customers as “Regular” or “VIP”. The number within parentheses in the end nodes represent # of observations / # of observations in the wrong class in the training set of 42,255 observations. For example, the end node with a red border at the center of the diagram has 7,590 customers assigned to the class “VIP”, with 546 of those incorrectly assigned to that class.
Note how not all predictors of “VIP” (high spending) status are obvious. Even though we’d expect customers with a lower average purchase (less than $7.91, top left node of the tree) to be “Regular” (lower spending), some of the behavior of “Regular” customers is far from intuitive. For example, one might expect that customers with an average purchase greater than $7.91 and average product price above $ 7.13 would be higher spenders (“VIP” class), but in the training set, 1,204 of those are “Regular” customers, and only 33 are “VIP” (see far left end node with red border). In the full dataset of labeled customers, 89% of customers with this profile are “Regular” (i.e., spend less than $5,000 in the stores per year).
It’s also interesting to see that “maximum distance to a store” is a more important predictor than “minimum distance to a store” (metrics provided by R show that on a scale of zero to 100, “maximum distance of a mile or less” has importance 98.11, and “minimum distance of a mile or less” has 84.15). Discoveries like this would be hard to identify without the help of machine learning, and can raise issues more important than the business originally meant to address.
As a result of this quick exercise, based on the attributes that the model found to have the strongest predictive power (average purchase amount, average product price, and maximum distance from the customer’s home to one of the five stores), the company is now capable of flagging more potentially “high-value” customers to invite to respond the survey.
Instead of merely writing the requirements for a solution that would enable the company to survey customers already flagged as “VIP”, the BA contributed with additional value to the project by allowing the survey to be extended to more customers who, based on the identified predictors, are likely to be in the same “high value” category as the “VIP” customers that the company wants to hear from. The survey results from the two groups of customers (“true VIPs”, and “predicted VIPs”) could be kept separate and compared to check if they seem consistent before being consolidated to inform the company’s decision-making process.
If the model helps the business achieve its objectives, it can continue to “learn” as more customer data becomes available, with benefits in terms of accuracy and identification of new customer behavior that may start to develop over time.
# # #
As seen in the case study above, machine learning can exploit fundamental correlations between a variable of interest at a certain future time and other correlated metrics at a current or historical time, and predict the future state of the variable with some accuracy. There are myriad situations when predictive analytics can help business analysts solve business problems or improve the quality of their solution. From analyzing data from a CRM system to understand the drivers of customer churn to using web crawling data to predict which mobile device doctors would feel more comfortable using before making a large purchase decision, the applications are endless. And with the proliferation of free learning resources and open source applications, there is no excuse for ignoring these powerful tools and techniques that can dramatically improve the impact of business analysis in your organization.
# # #
Ready to jump into the predictive analytics bandwagon?
Here are some good resources to help you get started:
How to Run Your First Classifier in Weka
Practical Data Science with R