You’ve probably heard that Google doesn’t like to index site search pages. If you ask an SEO about their opinion, 90% will say you should stop indexing and crawling of the site search pages, and the source would often be a blog post Search results in search results by Matt Cutts from 2007.
But, it’s not 2007. anymore, and many of Google’s guidelines have changed since. One in particular, related to this subject, has vanished from Webmaster Guidelines - “Use robots.txt to prevent crawling of search results pages or other auto-generated pages that don’t add much value for users coming from search engine.”
The rationalization why Google had that guideline back then was the poor quality of site search pages, like random and unrelated products showing up. Today that’s not the case anymore. The site search solutions have evolved - technologies like natural language processing considerably improved the quality of the site search pages.
Do not block the site search pages in your robots.txt file!
By blocking the search pages in the robots.txt file you are not blocking the site search page indexing, rather, you are just blocking the crawlers from reading those pages. These pages can still be indexed as shown in the example below.
What’s even worse with this setup is that the links on the search pages will not be followed, so you are throwing away all external link equity from your website that could have been passed to all internal links on those pages.
Besides the bad side on the SEO front, blocking search pages in robots.txt is bad for PPC too. GoogleAds bot is also following all the rules you set, and if it is blocked via robots.txt, it can not analyze the landing page experience. And PPC experts often tend to use the site search pages for their landing pages. The consequence is lower Quality Scores and higher cost per click.
You can test if you are blocking your search pages on your website with Google’s Robots.txt Tester tool (you will need to be logged in to your Google Search Console account).
- Go to your website and search for anything using your site search engine
- Copy everything in your URL after your domain name
- Paste the copied URL segment in Google’s Robots.txt Tester and hit the test button.
- If you are blocking your search pages in your robots.txt file you will see which directive is doing so and what you need to remove.
Meta Robot Directives
There are two types of Meta robots directives, the HTTP Header X-Robots Tag (which are sent via web server as HTTP header) and the Meta Robots Tags (a piece of HTML code on the page). They give crawlers instructions about how to crawl and index information on your pages.
HTTP Header X-Robots Tags are a rarely used method to control bots and it’s mostly used for making sitewide crawling and indexing strategies. For example, if you want a certain type of document to be blocked for bots, (let’s say you don’t want your publicly posted pdfs to be indexed). Still, you’ll want to check it out, just in case…
If you are using Chrome, you can open Developer Tools (Hit F12), open the Network tab, and reload the page (Hit F5). In the network name list, find and click the request URL you are on, on the right hand select the Headers, and go through your Response Headers - if you are using X-Robots-Tag it will show up here.
What is widely used to control a crawler's behavior on the website are the Robots Meta tags. Robots Meta tags can be found in the source of the document - to check it out you can right-click on the website page and click on the View page source from the dropdown menu.
In the source document, you’ll want to find the <meta name=”robots”... code where you’ll be able to see how your parameters are set to control the crawler's behavior on the site search pages.
Meta Robot Parameters
There are a dozen different indexation controlling parameters you can use in your Meta robots directives, like NOIMAGEINDEX, NOARCHIVE, NOSNIPPET, etc., but we'll cover only the four important ones:
- The first two parameters are straightforward, you can set your pages to be INDEX or NOINDEX, so, either you want crawlers to index your page and show on SERP, or you don’t.
- The second two parameters, FOLLOW or NOFOLLOW, are often misunderstood by many people. When you set your Robots Meta tag to FOLLOW, you are telling the crawler that it should go through all the links on the page and that it should pass the link equity to the linked pages. If you set your Robots Meta tag to NOFOLLOW, you are telling crawlers not to crawl the links on this page and not to pass the link equity to them.
- If there is no Robots Meta tag on the page then the default parameters imply, which are INDEX and FOLLOW.
Meta Robots Parameters on the site search pages
You should always set your Robots Meta tag to FOLLOW on your site search pages because the site search pages are often linked to by users on social networks, websites, blogs, forums, etc… You don’t want to waste that free and organic link-building potential that is happening on your ecommerce website. If you set the parameter to NOFOLLOW, you are throwing all the link equity from the site search pages away.
What about the crawl budget?
That’s not something you should worry about at all. Even though there are infinite possibilities for the number of pages using the site search, crawlers will not go through your search function and start writing the queries by themselves. Crawlers will just go through the links they can see on your (or other) websites, and read them. If the link doesn’t exist somewhere, crawlers will not be able to reach the page.
And what about the site search indexing? Should they be indexed or not?
Some of the biggest ecommerce sites in the world are indexing their site search results - for example, Amazon and Wayfair. If you have a good site search solution that has natural language processing technology, with error tolerance, stop words, and inflections, and if you have an SEO who will do some control for potential keyword cannibalization and bad SERP control, you should go for the INDEX option. The benefits you could see with SEO automatization through the site search are far greater than the potential risks it contains.