Is it possible for a BA to concentrate on both Business Analysis and Business Analytics side by side?

A reader asks:

Is it possible for a BA to concentrate on both Business Analysis and Business Analytics side by side?

In order to have a productive conversation about a topic without getting derailed by misconceptions, first we need to eliminate any confusion about how the terms are being used. So here’s how I’m using the following terms throughout this article:

Analytics: The science of applying a structured method to solve a business problem using data and analysis to drive impact. [1]

Business analytics: The use of simpler analytics methodologies on past data. [1]

Advanced analytics: Everything else, including predictive analytics. [1]

Business analysis: The activities associated with defining and understanding business problems and determining the right solution. Such activities include problem identification, requirements discovery, analysis, specification, validation, approval, as well as knowledge transfer about desired outcomes, requirements, assumptions, and risks to the delivery team.

Just based on these definitions, it’s pretty clear that business analytics (as well as advanced analytics) can have a huge impact in the quality of business analysis. Analytics can help you understand the specifics of the problem you’re trying to solve, and add precision to decisions about priorities, requirements and design.

Here are some examples of how business analysts can use analytics to improve the quality of their solutions:

  • A BA working on the prioritization of change requests submitted by users of an internal application can use analytics to determine how many people would benefit from each of the proposed improvements and make prioritizing recommendations accordingly. (E.g., “500 users need to reply to a message in our platform each month, and in average each of these users replies to 10 messages a month. 30% of the time users have to change the “From:” field before replying so the message is sent from a shared account. The “reply-from” enhancement meant to facilitate changing the “From:” field will affect 500 x 30% = 150 users and 500 x 30% x 10 = 1,500 replies per month. We recommend bumping this improvement to the top so it’s implemented before this other planned enhancement that will only affect 12 users performing an average of 3 actions per month.”).
  • A BA working for a credit card issuer and responsible for a system that manages the credit card approval process examines historical data and notices that the company is showing signs of increased losses and incorrect actions for customers. She uses descriptive analytics to divide customers into multiple segments and recommends the company moves from the existing single model for predicting risk for all customers to distinct risk models that handle only a subset of all customers. The change improves the accuracy in risk modeling and creates millions in loss savings.
  • A BA working for a retail business and responsible for specifying a solution to enable the company to implement a survey to collect the preferences of its “high value customers” uses historical data to expand the reach of the customer survey.

As you can see from these examples, analytics helps BA find better solutions to the problems they’re trying to solve. Business analytics and advanced analytics are extremely useful problem-solving tools. Used well, they can make for better software products, happier customers, higher profitability, more productive employees, improved accuracy in risk modeling, quicker work turnaround, and all sorts of positive outcomes.

So to answer the reader’s question, not only it’s possible for a business analyst to focus on both business analysis and business analytics in the work they do, it can hep you dramatically expand the value you deliver to your organization.

How to get there?

You can’t be great at business analytics without getting out in the real world and start spending time with business data. If you don’t have an opportunity within your workplace, find a nonprofit to help, and get to work analyzing their survey results or Google Analytics data. You don’t need a PhD in statistics, but being familiar with basic statistics will greatly enhance your career as a business analyst with analytics expertise, so find some online resources to help you get a solid grounding on this topic. Statistics training will help you enhance your analytical thinking and look at business problems differently.

For tips on getting started with predictive analytics, check out the resources listed at the end of the article It’s time for business analysts to add machine learning to their “bag of tricks”

[1] Behind every Good Decision: How Anyone Can Use Business Analytics to Turn Data into Profitable Insight


Photo: Neerav Bhatt (CC)

It’s time for business analysts to add machine learning to their “bag of tricks”


Are you tired of hearing about machine learning and “the artificial intelligence revolution”? Or perhaps just dismissive of the topic, thinking it’s not relevant outside of the realm of Amazon and Google or a team of data scientists working for a Fortune 500 company? Well, it’s time to change your mindset and recognize that machine learning is another powerful tool that business analysts can add to their “bag of tricks” as they look for new ways to deliver value to their organization.

With the proliferation of open source tools, as well commercially available applications that hide most of the complexity of building a prediction model, it’s getting easier and easier to use machine learning to help solve all sorts of business problems. The applications are endless. To preserve the confidentiality of proprietary information, I can’t use any real-life example from my consulting practice. Instead, I’ll illustrate the possibilities using a simulated project based on a real dataset released as a companion for the paper Explaining the Product Range Effect in Purchase Data by Pennacchioli, D., Coscia, M., Rinzivillo, S., Pedreschi, D. and Giannotti, F., . In BigData, 2013. (If you’d like to play with the same data, you can find it here.)

The dataset contains purchase data aggregated by customer from January 2007 to December 2011 for five supermarkets owned by Coop, one of the largest Italian retail distribution company. It includes variables such as the distance between the customer’s house and the closest and farthest store locations (which may carry different items), the average unit prices of the products purchased by the customer, the total amount of items purchased, and the price paid for each purchase during the time span covered.

Simulated Case Study

A multi-unit enterprise is interested in expanding its line of products to attract more “high value customers” in a specific geographic area where there it has five stores. The company uses a loyalty card to track the patterns of consumption of each registered customer and classify as “high value” the ones who spend $5,000 or more per year across the five stores, assigning them a “VIP” status.

A business analyst on the IT team is assigned to write the requirements for a solution that will allow the company to create and execute a survey to collect the preferences from the “VIP” customers of the five stores. These preferences will inform the decision of which new items to carry in the stores to increase customer loyalty and share of wallet among those top customers. As the BA gathers information from the stakeholders and internal databases to inform the requirements for the project, she learns that the company has 60,365 registered customers with a label of “VIP” or “Regular” assigned to them. Out of these 60,365 labeled customers, 18,278 have “VIP” status. An additional set of 8,103 customers are still unlabeled, as their use of the loyalty card is more recent. This set includes both brand new customers and existing customers convinced by a recent marketing campaign to register to the loyalty program for which the company doesn’t have yet enough purchase history collected for label assignment based on the existing rule.

The BA decides to see if it’s possible to find any reliable patterns to identify “VIPs” among these 8,103 customers with a shorter purchase history in order to expand the number of people invited to take the customer survey, so more data is available to generate insights and the preferences of top customers just adhering to the loyalty program can be taken into account. After extracting the records of the 60,365 labeled customers from the loyalty system, she uses R Studio (an open source tool that makes it easier to use the open source statistical language R) to make sense of the data.

She does some variable selection and transformation, and splits the resulting dataset into two sets, one for “training” the model (42,255 records, 70% of the available data), and another for testing the model accuracy with “unseen data”(18,109 records). She then uses a supervised learning algorithm to analyze the training data and produce an inferred function that can be used for mapping new examples of “VIP” customers.

To find out if she trained a good model, the BA looks at statistics provided by R including Accuracy (90%) and Kappa (77%), and checks the model against the training and test datasets. The following table shows how well the model performed with new observations (the test set with labeled customers not used to train the model). The correct predictions are the ones in the diagonal line from top-left to bottom-right. Our test set has 12,619 “Regular” customers, with 11,729 (92%) being correctly labeled by the model. The test set also has 5,490 “VIP” customers, with 4,624 (84%) classified as such. The error (misclassification rate) is 9.7%, consistent with the performance achieved by the model with the training data.

Additional steps could have been taken to improve the model: test a different algorithm (extremely easy to do using the R package caret), find a better combination of predictors to feed into the model, and so forth. But since the business is not dealing with a problem in which false positives or false negatives would cause any harm, the BA decides that the model is good enough for the intended purpose. She can now apply the model to predict the status of customers in the unlabeled set using predictors like average purchase cost, and flag the ones assigned the “VIP” class to participate on the survey.

The figure below shows a partial depiction of the resulting decision tree used to classify customers as “Regular” or “VIP”. The number within parentheses in the end nodes represent # of observations / # of observations in the wrong class in the training set of 42,255 observations. For example, the end node with a red border at the center of the diagram has 7,590 customers assigned to the class “VIP”,  with 546 of those incorrectly assigned to that class.

Note how not all predictors of “VIP” (high spending) status are obvious. Even though we’d expect customers with a lower average purchase (less than $7.91, top left node of the tree)  to be “Regular” (lower spending), some of the behavior of “Regular” customers is far from intuitive. For example, one might expect that customers with an average purchase greater than $7.91 and average product price above $ 7.13 would be higher spenders (“VIP” class), but in the training set, 1,204 of those are “Regular” customers, and only 33 are “VIP” (see far left end node with red border). In the full dataset of labeled customers, 89% of customers with this profile are “Regular” (i.e., spend less than $5,000 in the stores per year).

It’s also interesting to see that “maximum distance to a store” is a more important predictor than “minimum distance to a store” (metrics provided by R show that on a scale of zero to 100, “maximum distance of a mile or less” has importance 98.11, and “minimum distance of a mile or less” has 84.15). Discoveries like this would be hard to identify without the help of machine learning, and can raise issues more important than the business originally meant to address.

As a result of this quick exercise, based on the attributes that the model found to have the strongest predictive power (average purchase amount, average product price, and maximum distance from the customer’s home to one of the five stores), the company is now capable of flagging more potentially “high-value” customers to invite to respond the survey.

Instead of merely writing the requirements for a solution that would enable the company to survey customers already flagged as “VIP”, the BA contributed with additional value to the project by allowing the survey to be extended to more customers who, based on the identified predictors, are likely to be in the same “high value” category as the “VIP” customers that the company wants to hear from. The survey results from the two groups of customers (“true VIPs”, and “predicted VIPs”) could be kept separate and compared to check if they seem consistent before being consolidated to inform the company’s decision-making process.

If the model helps the business achieve its objectives,  it can continue to “learn” as more customer data becomes available, with benefits in terms of accuracy and identification of new customer behavior that may start to develop over time.

# # #

As seen in the case study above, machine learning can exploit fundamental correlations between a variable of interest at a certain future time and other correlated metrics at a current or historical time, and predict the future state of the variable with some accuracy. There are myriad situations when predictive analytics can help business analysts solve business problems or improve the quality of their solution. From analyzing data from a CRM system to understand the drivers of customer churn to using web crawling data to predict which mobile device doctors would feel more comfortable using before making a large purchase decision, the applications are endless. And with the proliferation of free learning resources and open source applications, there is no excuse for ignoring these powerful tools and techniques that can dramatically improve the impact of business analysis in your organization.

 # # #

Ready to jump into the predictive analytics bandwagon?

Here are some good resources to help you get started:

How to Run Your First Classifier in Weka

Practical Data Science with R

Need help getting your team ready to use predictive analytics methods that are appropriate to your issues without feeling intimidated or producing misleading insights that don’t add value?

Get in touch and we’ll be happy to help you accelerate your journey to becoming a data-driven or data-inspired organization.

* * *

 Photo credit: O’Reilly Conferences (Creative Commons)