Thin Content is a misunderstood concept that should be important to anybody who is in the business of managing an effective SEO campaign.
In this article you will learn;
What Is Thin Content?
Some people have mistaken the word thin with the word small.
This has given way to the false idea that shorter in length content would always be considered thin...
In fact, thin content can be of any length if it meets this one criteria:
- Low value
If your content is low value it is considered to be thin.
How Thin Content Works In 2020
When Google first released Panda in 2011, it introduced some of the methods still used today for handling pages and even sections of low value content.
Penguin which was released later actually started acting at the page level, whereas Panda collects information from the page level and acts at the domain level.
Even since then the way that thin content is treated has changed a lot...
Today, you will sometimes receive a thin content penalty or ‘manual action’ in Google Search Console.
Now, instead, it is more likely that you will be filtered which means you won’t receive a warning even when your site is being affected by thin content.
Google used to sometimes omit certain content from their index, filtering out singular paragraphs or sentences from your content.
Everyone's reaction when low quality and spammy content is not indexed anymore pic.twitter.com/QU8S2nCupb— Gary "鯨理／경리" Illyes (@methode) January 22, 2020
This no longer appears to be the case, and the tweet above from Gary Illyes at Google seems to further confirm this suspicion.
Instead if your page is considered “too thin” you may likely see it not index at all.
A Note On Soft 404 Errors
When pages return a status code of 404 it means that the page is Not Found.
A Soft 404 Error is applied to pages on your site when a URL returns a status of 200 (OK) but Google thinks the page offers no value due to a couple of reasons;
- Pages with little or no content.
- Pages which are thin due to reasons we will explore in this guide.
Other than the first reason it is increasingly rare to get a Soft 404 Error in Google Search Console.
Avoid Thin Content
Any amount of thin content could potentially harm your sites overall quality score which is something that should be avoided.
Even if your site isn’t showing signs of obvious actions such as the ones above, you should still take measures to eliminate as much thin content as possible.
What Causes Thin Content?
So, what are the causes?
There are several possible causes for thin content, here are some of the most common...
How to Diagnose and Fix Thin Content:
Diagnosing thin content isn't as easy as it used to be, and instead you need to know what to look out for and become your own detective at discovering these issues.
Here are some common scenarios from the cause list above.
Overly promotional affiliate content
If you are suddenly stuffing dozens of affiliate links into a basic 1,000 word article then you might notice the rankings suffer.
It’s easy for search engines to see a high number of OBLs (Outbound Links) to affiliate pages, yes even when cloaking.
Diagnose: Do a quick manual check of your pages to count the number of affiliate links.
Solution: In general use one-two links per product depending on the length of the article.
Computer generated content
Computer generated content such as scraped content and spun content, as well as actual generated articles can in some cases index and rank.
However it’s becoming less common and this is because search engines are becoming better at identifying when this content is low quality.
As such it often doesn’t index at all, and many webmasters often get fooled by their writers who are actually spinning content to a readable level.
Diagnose: If you’re doing this I’d certainly hope you’d know about it, however use tools like Copyscape to check your writers content from time to time.
Solution: Don’t use computer generated content unless it can't be helped, and when you have to always avoid obviously spammy techniques.
Low quality content
Low quality content is content that offers little value because it doesn’t cover a topic in a satisfactory way.
Content vectoring allows search engines to understand which terms co-occur and as such it’s becoming easier to understand when your content is “thin”.
Diagnose: Use tools such as SERPStat, SurferSEO or Word2Vec via Python to find missing terms.
Solution: Make sure to do topic research for your content, using the above tools as well as regular methods… If you find yourself in this situation you might want to expand on your content or combine it with another existing piece.
Along with content vectoring to ensure a topic is covered properly, it’s also common for people to try and inflate the number of words in their articles.
Using the shotgun approach for content creation opens the door to relevance dilution and cannibalization.
Diagnose: Learn to check for cannibalization with Google Search Console and re-check content with topicality audits.
Solution: Always keep your content as concise as possible. If you find yourself in this situation, you need to perform content pruning, however a content rewrite will be your best option.
Duplicate content both internal and external can trigger the algorithm to flag you for thin content.
Internally you should try to avoid indexing pages with lots of duplicate content, such as low quality category pages.
In other cases you should be using a canonical link, and we’ll talk about those in more detail in a sec.
With external duplicate content this can happen when using computer generated content, when writers break bad and due to negative SEO attacks.
Diagnose: For internal content check SiteLiner and for external use Copyscape.
Solution: Use NoIndex and Canonical rules correctly and re-write content when necessary.
Parameter pages are commonly caused by faceted navigations, search areas and more.
When a self-referring canonical is not set on the main page this can then cause duplicate content issues which in turn results in thin content issues.
Note: Some sites prefer to disallow these areas entirely via robots.txt however I have found the results unpredictable and prefer a less aggressive route.
Diagnose: View page source, run a search for canonical and see if you have one.
Solution: Always set a self-referring canonical on pages that use parameters.
This is less common, however on some sites such as ecommerce stores it is common to attach buyers guides to category pages.
However when that buyers guide starts appearing on page 2, 3, 4 and so on it can cause issues.
Diagnose: A manual check should suffice.
Solution: Use a self-referring canonical and if this isn’t applicable then set the paginated pages to NoIndex with an X-Robots-Tag.
Google has said that doorway pages include pages that are generated to funnel visitors to actual valuable content on your site.
While not many sites do this outside of sitemaps, which shouldn’t be indexed, it’s possible that you do this with category pages or custom category pages.
Diagnose: Manual checks should do fine.
Solution: NoIndex Thin Pages that are valuable to users, but not search engines.
While there are many obvious ways to tell if you’re being affected by thin content, there are also a lot of sites being held back by thin content without knowing about it.
The good news is that they don’t need to be with these easy checks to perform.
P.S. Got a question or comment? Join my free group and post here to join the discussion and let me know!