Correlation is not Causation! … And so what?

You’ve probably heard it a thousand times: “Correlation is not causation.” You also probably have an intuitive sense of what this sentence means: “Just because X and Y move together doesn’t mean X causes Y”. But do you really understand what the distinction between correlation and causation implies for your business? Let’s dive in!

A good friend of mine teaches marketing to MBA students. Here’s a question from her exam:

“A bank observes that customers who buy life-insurance policy are less likely to churn (i.e., stay with the bank longer) than customers who don’t. The account manager then suggests pushing life insurance policies on all customers to increase the retention rate. What is the flaw in the manager’s reasoning?”

While a good majority of students could guess that there was something wrong with this argument, less than 25% gave the correct answer: According to the manager’s reasoning, correlation is causation. Of course, this isn’t the case here: It is very unlikely that buying the life insurance magically causes customers to stay longer with the bank. Instead, it is much more likely that people only buy life insurance from banks they want to remain customers of.

Business Analytics

In today’s world, analytics are everywhere. Companies have realized that they are sitting on piles of data waiting to be used, and they are hiring smart people to uncover statistical relationships in this data. After uncovering these relationships, companies use them to inform their business practices.

Let’s consider a practical example. After conducting a careful review of employee performance, an organization realizes that top performers tend to have one thing in common: They were assigned to a senior manager in their first few years at the company. They try other specifications, the relationship appears strong and robust: So far, so good.

From this data, they conclude that working with a more senior manager has significant benefits for employees. This is a reasonable conclusion to make: After all, senior managers have more experience, they might master a large range of skills, they might be better mentors… They thus decide to assign more junior employees to their existing senior managers, and to hire more senior managers.

But did you see what happened here? It’s the confusion between correlation and causation rearing its ugly head again! While the pattern in the data is sound, the business decision that followed from it is not: It assumes that correlation is, in fact, causation. It assumes that by assigning more junior employees to senior managers, they will turn them into top performers.

Alternative Explanations

Wait a minute you’ll say: What would explain the strong relationship between being assigned to a senior manager and being a top performer? Well, other hypotheses are equally reasonable:

What if senior managers are better at detecting future top performers, and claim them for their team soon after they join the company?

What if future top-performers are more career-driven, and thus more likely to insist on being assigned to senior managers because they perceive that it will help their career?

What if, finally, the relationship is entirely spurious: The company might have changed its recruitment process (which allowed them to recruit more talented workers) and its management practices (which encouraged pairing junior employees with senior managers) around the same time.

This is the key problem with correlational data: If you observe that X is correlated to Y, you have absolutely no guarantee that changing X will lead to changes in Y… and thus are unable to know which business decisions you should make from them.

What should you do instead?

First, recognize that there is a big difference between correlational and causal evidence. When reviewing the scientific literature, you need to assign a much stronger weight to causal design! In medical research, hundreds of correlational studies had concluded that vitamin D supplements had significant health benefits… until randomized controlled trials (a type of research design allowing to capture causal effects) revealed that this relationship is spurious!

Second, treat correlational evidence as a way of generating hypotheses. If X is associated with Y, do not blindly assume that changing X will change Y. Instead, think of all the possible scenario, like we just did: Could Y cause X instead? Could something else cause both X and Y?

Finally, if you really want a causal answer to a question, don’t be afraid to run an A/B test! Yes, it is often expensive and impractical to run one… but you need to weigh these costs against the risks of making decisions based on incomplete, and potentially biased, information. In fact, there is evidence that companies who run more experiments are also more successful… and yes, this evidence is correlational 😊.


To go further…

Thomke, S. Building a Culture of Experimentation. Harvard Business Review. 2020.

Bohnet, I. How to Take the Bias out of Interviews. Harvard Business review. 2016.

Found this post insightful? Get email alerts for new posts by subscribing:

Zoé Ziani
Zoé Ziani

PhD in Organizational Behavior