Correlation Does Not Imply Causation: Don’t Be Misled by Spurious Relationships!

❮
❯

Correlation Does Not Imply Causation: Don't Be Misled by Spurious Relationships!


In our daily lives, when we observe certain patterns among different sets of data, we often go on to infer that one change is the cause leading to another change. However, to correctly interpret these patterns, we must distinguish whether there is a definite causal relationship between the data or merely a correlation — thus avoiding falling into the trap of spurious relationship, where the apparent statistical connection is actually caused by a third factor or coincidence.


What is correlation? What is causation?

Correlation refers to a statistical relationship between two items. When we say two items are highly correlated, it means that when one changes, the other tends to change at the same time. However, this does not necessarily imply a causal relationship.


Charts about Correlation


Causation refers to an action relationship between two events, where one event (cause) leads to or contributes to the occurrence of another event (effect), and this relationship requires to be supported by repeated experiments or rigorous observational evidence.

A well-known example of a spurious relationship is the simultaneous increase in sales of sunscreen and ice cream during summer. It is easy to appreciate that both are correlated, but it would be hard for us to accept by common sense that there is a causal relationship. In reality, both are influenced by a third factor, “sunny weather”, resulting in a correlation (bright sunshine → increases the demand for sunscreen; rising temperatures → heighten the desire for ice cream).

This example is simple and easy to understand, but many practical situations might not always be so obvious.


Examples of Correlation and Causation


A real data example: synchronised fluctuations in retail trade and food services sector

According to statistics released by the Census and Statistics Department regarding the retail trade and food services sectors, the value index of retail sales and that of restaurant receipts exhibited largely similar trends from September 2024 to December 2025.


Value index of retail sales and value index of restaurant receipts


If analysed solely based on data patterns, some might infer that as people dine out more frequently, there are more chances for retail shopping; some others might infer from a reverse perspective. However, is there truly a causal relationship between the two? Through logical analysis, it is not hard to discover that the spurious relationship between them is most likely stemmed from a third factor, “overall consumer sentiment and the economic environment”. When consumer sentiment strengthens or the economic environment improves, more people tend to dine out with friends and relatives (boosting restaurant receipts) while simultaneously stimulating retail spending on various goods (driving up retail sales).


From “data coincidence” to “scientific evidence”

Since synchronised fluctuations in data may simply be a spurious relationship influenced by a third factor, how can we identify true causation? The key is to use rigorous controlled experiments to eliminate the influence of other factors, thereby determining how one event (cause) affects another event (effect).

A classic example is how the medical community elevated simple statistical observations into credible scientific evidence: Scientists observed a higher incidence of lung cancer among smokers. They carefully studied such relationship by controlling the levels of other potentially contributing factors, and conducted extensive medical research and experiments to prove that harmful substances in cigarettes would damage lung tissue. This led to the conclusion that “smoking is one of the main causes of lung cancer”. This is a causal relationship supported by scientific evidence, verified through repeated experiments, and having excluded the influence of third factor.


Smoke and Cancer


Understanding the differences between correlation and causation is essential for effective data analysis. By enhancing statistical knowledge and using proper statistical methods for data analysis, we can more accurately interpret relationships and make wiser decisions in our daily lives.


LAW Mei-shan, Sophia
Statistician
16 April 2026