Optimizing For Average = Correlation

It's been a while since I last posted an 'SEO' article on my blog, which is down to my swearing off it earlier this year. That is still largely true, but I always wanted to make exceptions for thought pieces such as this one. Sharing content that isn't specifically about tactics and strategies is a place that not many SEOs are willing to tread, but screw it, I'm not in a position where I have to worry what certain people in the industry think of me. So here I am, and before I start I will preface this with a disclaimer... If this upsets anyone, then good because it should.


I have a problem with the 'by the averages' approach set of tools... And so should you!

These tools focus on content analysis which is achieved via cross-comparison of any number of competing pages. Once the data is gathered from each competitor it is then compared to your own content.

You'll probably have heard of at least one of them, you all know what tools they are - the ones that make 'increase' and 'decrease' recommendations. This is done according to, among other things, the average number of instances utilized by your competitors... These tools have slight differences of course, and some of the features - I will admit are better than others.

It's the core idea however that is a bit skewy, which is this... "Optimized content means matching the averages of your top competitors for as many factors as possible".

confused face

Truthfully this works at times, after all even a blind person throwing darts is going to hit the bullseye sometimes.

This 'core idea' though... It just doesn't make sense, and it's not correct...

The reason for that is quite simple, correlation is not causation... Most high-schoolers have heard that before. So I personally don't believe that most of the search engine optimization industry that hypes these tools has not heard this. I also think it's unlikely that many of those who have are unable to understand it. Maybe I'm wrong and they are dimwits, I'll leave that for you to decide, but I personally see another more plausible explanation...

Correlation and Dependence: In statistics, dependence or association is any statistical relationship, whether causal or not, between two random variables or bivariate data. [1]

It's not so much that people don't understand the problem with correlation, it is more that they can't always identify it in the first place... In fact, correlation has always been a trickster, and it has always haunted this industry.

Correlation has always been a trickster, and it has always haunted this industry.

John changes 5 things on a page at once, it gets a good result. John thinks that thing #2 made all the difference.

I'm not going to insult your intelligence by telling you what is wrong with that, but it happens all the time and it's how most unfounded / plain wrong ideas in our industry started.

its a trap

A good result, wrongly credited because of correlation, can quickly give way to heuristics such as confirmation bias. Less of the can and more of the does in practice. This is also exactly why I have always maintained that one of the best ways to improve at SEO is to level-up your thinking.

Unfortunately, I can't help but think that the people who created these tools did it for one or two reasons, or possibly both.

  1. They had some good results and wrongly attributed the reasons why via correlation to the 'by averages' approach they recommend.
  2. They know it's a flawed idea, but they just don't care because $$$

Either way, they're not someone I'd put my faith in to help direct my campaigns.

My views on this do not change the fact, that amazingly these tools have insiduously seduced a large portion of the industry, who now believe these tools to be scientific marvels.

computer science meme

The fact is that if you optimize for average, you will get average... The idea that you can get more is even etymologically incorrect - average has never meant optimal.

It's also worth noting that for some people average results actually constitutes an overall improvement.

With our example, John the SEO who made 5 changes at once. He can't reasonably say what did or did not help... These tools have you doing too much at once because correlation has given the false belief that it's best.

The net result may be positive, say you change 15 things at once... 8 are positive improvements, and 7 are negative or neutral. The net result is an improvement in rankings.

In real-world testing situations, it is normal to have results of the following type: True positive, False positive, False negative, and True negative. In SEO it is common to change too much at once and get a False positive.

It's easy to see how this can give the wrong impression.

It's also kinda obvious to see when we explain it like this that using averages might hit the bullseye a few times for certain factors. But it might also lead to under or over-optimization of other factors at the same time. A net result creates an illusion of total improvement, which then leads to the false / correlation based belief that optimizing for averages is a sure way to improve rankings.

To me this is not the same though, optimizing for average is something I feel to be a deeply flawed strategy.

Optimizing for average is something I feel to be a deeply flawed strategy.

The one thing that I have not yet mentioned is the growing number of users who get inconsistent results with these tools, because it's always hit and miss when optimizing for average.

I personally prefer to optimize until I see a depreciating result for the time invested (law of diminishing returns), not until a tool tells me that I've achieved the perfect average, whatever that means. Doing it the way I do is true optimization.

Hopefully you can see that too, and I will also add that I hope it matters to you... Science from it's latin roots means knowledge. Science is rigorous to help improve our knowledge, and how much do we really learn from these tools?

For all their warts, including the strong possibility that the creators failed scientifically to identify correlation when building the tools, they still have scores of useful raw data that can be used by someone who knows how.

So I'm not saying you should stop using these tools, instead I am saying you should use them in a responsible and informed manner which is not what I have been seeing on the whole.

Correlation is not causation, and average is not optimal. These tools aren't as scientific as they should be. I also want to say that despite what you may have been told - we can do better, but that first starts with awareness.



Share this post 👍