Thin Content is a term which describes web pages with low value, low quality, or duplicate content.
TL;DR – If your site has thin content you should fix it. Even a single thin page, or section of a page, can harm your sites rankings by lowering its quality scores. There are not always obvious signs that your site is being affected by thin content. So it is especially important to improve as much content as possible.
What Is Thin Content?
It can be better to think about thin content in regards to what it is not. If a page does not offer value – it is thin. If a page does not offer enough value it can be somewhat thin. If a section of content is of low quality it can be thin.
A common misconception about thin content is that it equates to pages with low word counts. This is incorrect, thin pages can be of any length if it is of low value to the user. Some pages with low word counts can offer a lot of value, in so doing satisfying the needs of the user. Long-form content which fails to address it’s subject in a comprehensive way is thin overall.
Panda, Penguin, & Passage-Based Indexing
Google first released the Panda algorithm in 2011. This introduced methods for handling low value content which are still used today. Penguin, which Google released later was more important for handling content quality. Both algorithms introduced the ability to measure value in ways not seen before. Panda acted at the domain level, Penguin at the page level. Today, it is common to see thin content affect the ranking abilities of the site as a whole.
In late 2020, Google announced that they are to begin passage-based indexing. This will see only parts of your content indexed based on query-relevance. Algorithms such as Panda and Penguin, may still affect the indexability of passages. It is also worth noting that thin sections of content could become more important due to this change.
What Causes Thin Content?
Thin content is often caused by poor research, lack of expertise, bad writing, unintended errors, or other.
1. Computer generated content
There are several ways in which you can generate content with a computer. Old methods such as word spinning often creates low quality content which will not index. Not all computer generated content is low-quality though, ‘it depends’. Some methods in the hands of a skilled individual can create high quality content.
There are some exciting advances in the field with transformer models like GPT-3. It has received a lot of press attention for its ability to create quality content, in the right hands.
If you do not know what you are doing, you should not use computer generated or “AI” content.
Vectoring can create many complex ways to identify information about your content. It is able to identify topic coverage, relevance, and more. This can be useful for measuring expertise, which is growing in importance due to E-A-T.
- Make sure your content deals with the topic and avoids getting too much into related topics that should be dealt with in their own article.
- Create introductions and conclusions that mention the key topic and any important entities.
- Use concise Page Titles so that Google can determine the page is about what you say it’s about.
Not having comprehensive content that covers a sufficient number of topics is another way that Google can identify thin content. Using algorithms designed to detect entities and topics they can cross-reference your content against other results to help determine how comprehensive something is. If your content isn’t comprehensive enough – this could suggest that it’s thin.
4. Information gain
There is a concept known as first-mover advantage which is also important in SEO. The first website to cover a search query will inevitably become part of the champion list and subsequent entrants to the SERP need to provide something better. Because why should Google rank something new that doesn’t provide anything better or different? Without doing this, your information could be seen as less valuable, and potentially, “thin.” So always attempt to provide new facts, insights, or a unique opinion on the topic.
5. Not answering the question
If you don’t answer the question or answer it sufficiently then Google could determine that your content is thin because it doesn’t provide value to the user. It’s important to satisfy the intent of a search query by providing a thorough answer that’s comprehensive, but to the point.
6. Language usage
NLP (Natural Language Processing) may only be a part of a vectors pipeline, but it can do some powerful stuff as well. For example, it can identify usage of things like ‘passive voice’. Passive voice can suggest a lack of confidence in what you are saying… It is only one more way to check expertise and trust in a piece of content.
Utilizing NLP can be beneficial for you as someone who publishes content on the web. Even someone who is a true authority on their subject can be too verbose for a search engine. In these cases, content may seem thin even if it isn’t.
7. Duplicate content
Duplicate content can sometimes be the reason for a page or sections of a page to be thin.
Internal and External duplicate content are a different matter. External duplicate content is not looked on kindly, as it can be a sign of plagiarism. Many tools that check for plagiarism are overzealous though. With some passages and phrases being common, small amounts of external duplicate content is normal. You should take less common phrases or sentences to be more serious.
Internal duplicate content is often managed by the use of rel=canonical tags. Canonical tags denote the ‘source’ page of the content, helping avoid problems. This is common with parameter pages and paginated URLs especially. Sometimes you cannot use a canonical tag in areas where phrases will recur. This is fine in cases where the content is supplemental to the main content on the page.
No matter how much duplicate content you can use, the focus should be on original content. Creating high quality original content which adds value rather than restates it. Content quality can be a subjective matter, so opt for what satisfies the user.
8. Doorway pages
Doorway pages funnel visitors to valuable content on your site. In the past, people used these pages to manipulate rankings. These pages also often offer little original content. For these reasons Google does not like doorway pages. Sometimes pages with a lot of links in proximity can be mistaken for doorway pages.
9. Page structure
Page structure, or page design can also be a cause for a page to be thin. If your content does not meet the intent of the user in an expedient way, you may accrue bad UX signals. Leading machine-learning algorithms mark you as a low quality page. This will often cause you to lose rankings. Page structure can also be important for proximity of important terms. Thus page structure can be an important marker for relevance in more ways than one.
Identifying thin content
There are some telltale signs of thin content. When pages do not index or fall out of the index, it can sometimes be due to thin content issues. Some of which are listed in the section above. In other cases all you might have is a suspicion… For example, if many of your pages rank well, but a particular page doesn’t when it should.
Another method for identifying thin content is with Google Search Console. Search Console will sometimes give some details if your site has thin content. That said, you should not rely on it.
Note: In the coverage report some useful messages come under errors, while others come under excluded.
Search console messages to pay attention to:
- Soft 404
- Crawled – currently not indexed
- Discovered – currently not indexed
Your site can also receive manual actions, also known as a manual penalty.
- User-generated spam
- Thin content with little or no added value
- Hidden text and/or keyword stuffing
You may not know if you have thin content and there are more potential causes than ever. The best way to avoid thin content is pay attention to what matters. In so doing creating something of actual value.