When NOT to Trust Data to Inform Your Decisions
Sometimes I take a look at a marketing report and I'm amazed at how a certain metric has improved. Part of me just wants to believe and simply feel good about it. But another part of me wants to find a better explanation than the implied improvement in performance.
What if the data is skewed? What if I did not interpret the numbers right? There are so many occasions when you should take a second look at the data before you jump to any conclusions. Here are a few of them to start with.
1. External factors are affecting the data
Let's start with the most obvious and common issue: external factors that are affecting your results. Some of thse factors may be under your influence, like colleagues running another experiment or campaign at the same time, but others are completely out of your control, like general trends and events.
In some organizations, especially larger ones, it's hard to be in the know of everything that's going on and to communicate all of your actions with other departments. You must put in place the right mechanisms to track what may impact results. For example, the use of specific UTM tracking codes for all of your online campaigns, and using discount codes for your offline promotions will help you partially isolate the impact of other initiatives.
Factors outside of your organization's control may be harder to identify and exclude from your results. The best thing you can do is often to be aware of them and their scope and to write a footer notes on your report. Keeping up with your industry’s benchmarks and using tools like Google Trends and Google Alerts will help you be in the know.
2. The sample size is too small
Whether you are analyzing trends for a customer segment or assessing the results of your A/B tests, sample sizes do matter and are often overlooked. The smaller the increase or decrease in a metric, the bigger the sample has to be for the change to have any meaning.
Sometimes it's just impossible to reach the required volume given the size of your audience, but that's OK. As long as you know about it and will not make decisions solely on an unconfirmed trend.
3. There are too many manual steps
The more manual steps involved in a process, the more chances something goes wrong. Think about all the data cleansing, re-sorting, rearranging, and transformation steps you went through before the final output. Anything can go wrong at every step.
At almost every step of the process, you should validate the integrity of the data. Are there any records missing? Are all the values still attached to the right record?
4. Data points are missing
When you're only looking at the big picture, it’s easy to miss out on important details at a more granular level. For example, was the data collected on a daily basis as planned? Did the tracking begin later than the beginning of the reporting period?
To answer those questions, you have to slice and dice the data to its smallest subset. Actually, it’s not the end of the world when data is missing, you can still use interpolation to fill the holes. It's better to report on approximate data than use an incomplete dataset.
5. Poor choice of graph
Sometimes the data is accurate, but the way it's presented is wrong. The graph is supposed to communicate the data. If it does not communicate the right message at first sight, it may be the wrong type of graph. A common example would be using lines in a graph when bars would be more appropriate because the data points have no relation to one another.
There are obviously a lot more ways to get tricked by data. Have any personal experiences to share?