Noindex vs. Nofollow vs. Disallow: What Are the Differences?

These days, most website owners are keenly aware of the important role that high quality content plays in getting noticed by Google. To that end, businesses and digital marketers are spending increasingly large amounts of time and resources to ensure that websites are spotted by search engine robots and therefore found by their target audiences.

But while every website owner wants high search engine rankings and the corresponding increases in traffic, there are certain areas of a site that are best hidden from the search engine crawlers completely.

Why hide parts of your website from search engines?

You might wonder why it’s good to keep crawlers from indexing parts of your website. In short, it can actually help your overall rankings. If you’ve spent lots of time, money and energy crafting high quality content for your audience, you need to make sure that search engine crawlers understand that your blog posts and main pages are much more important than the more “functional” areas of your website.

Here are a few examples of web pages that you might want the search robots to ignore:

  • Landing pages: Obviously, landing pages are super important for generating leads and even selling products directly. However, you might have certain landing pages that contain seasonal offers or are designed for specific (paid) advertising campaigns.
  • Thank you pages: Once your digital marketing expands, your website will probably contain multiple “thank you“ pages where visitors are redirected after downloading a lead magnet or signing up to a mailing list. You’ll almost certainly want to keep these pages away from search robots, as they can appear thin in content and be interpreted as “spammy.”
  • PDF downloads: Following on from the above example, you’ll also want to ensure that any giveaway or download pages and their file attachments are hidden from your audience, as you certainly wouldn’t want them to be easily accessible without collecting an email address first.
  • Membership login pages: If your site has a member’s forum or client area, you’ll probably want to hide those pages from search engines too.

As you can see, there are plenty of instances where you should be actively dissuading search engines from listing certain areas of your site. Hiding these pages helps to ensure that your homepage and cornerstone content gets the attention it deserves.

How to hide parts of your website from search engines?

So how can you instruct search engine robots to turn a blind eye to certain pages of your website? The answer lies in noindex, nofollow and disallow. These instructions allow you define exactly how you want your website to be crawled by search engines.

Let’s dive right in and find out how they work.

The noindex instruction

As you can probably imagine, adding a noindex instruction to a web page tells a search engine to “not index” that particular area of your site. The web page will still be visible if a user clicks a link to the page or types its URL directly into a browser, but it will never appear in a Google search, even if it contains keywords that users are searching for.

The noindex instruction is typically placed in the <head> section of the page’s HTML code as a meta tag:

<meta name="robots" content="noindex">

It’s also possible to change the meta tag so that only specific search engines ignore the page. For example, if you only want to hide the page from Google, allowing Bing and other search engines to list the page, you’d alter the code in the following way:

<meta name="googlebot" content="noindex">

A bit more difficult to configure and therefore less often used is delivering the noindex instruction as part of the server’s HTTP response headers:

HTTP/2.0 200 OK
…
X-Robots-Tag: noindex

These days, most people build sites using a content management system like WordPress, which means you won’t have to fiddle around with complicated HTML code to add a noindex instruction to a page. The easiest way to add noindex is by downloading an SEO plugin such as All in One SEO or the ever-popular alternative from Yoast. These plugins allow you to apply noindex to a page by simply ticking a checkbox.

The nofollow instruction

Adding a nofollow instruction to a web page doesn’t stop search engines from indexing it, but it tells them that you don’t want to endorse anything linked from that page. For example, if you are the owner of a large, high authority website and you add the nofollow instruction to a page containing a list of recommended products, the companies you have linked to won’t gain any authority (or rank increase) from being listed on your site.

Even if you’re the owner of a smaller website, nofollow can still be useful:

  • If you’re a creative agency, you might have other companies’ logos embedded on your case study pages that could be confusing in image searches.
  • If you’re a blogger, your comments section might contain links that you don’t want to support.

Even if your pages only contain internal links to other areas of your website, it can be useful to include a nofollow instruction to help search engines understand the importance and hierarchy of the pages within your site. For example, every page of your site might contain a link to your “Contact” page. While that page is super important and you’d like Google to index it, you might not want the search engine to place more weight on that page than other areas of your site, just because so many of your other pages link to it.

Adding a nofollow instruction works in exactly the same way as adding the noindex instruction introduced earlier, and can be done by altering the page’s HTML <head> section:

<meta name="robots" content="nofollow">

If you only want certain links on a page to be tagged as nofollow, you can add rel="nofollow" attributes to the links’ HTML tags:

<a href="https://www.example.com/" rel="nofollow">example link</a>

WordPress website owners can also use the aforementioned All in One SEO or Yoast plugins to mark the links on a page as nofollow.

The disallow instruction

The last of the instructions we are discussing in this blog post is “disallow.” You might be thinking that this sounds a lot like noindex, but while the two are very similar, there are slight differences:

  • Noindex: Search robots will look at the page and any links it contains, but won’t add the page to search results.
  • Nofollow: Search robots will add the page to results, but will ignore the links within the page for ranking purposes.
  • Disallow: Search robots won’t look at the page at all.

As you can see, disallowing a page means that you’re telling the search engine robots not to crawl it all, which signifies that it has no use at all for SEO. Disallow is best used for the pages on your site that are completely irrelevant to most search users, such as client login areas or “thank you” pages.

Unlike noindex and nofollow, the disallow instruction isn’t included into a page’s HTML code or HTTP response, but instead is included in a separate file named “robots.txt.”

A robots.txt file is a simple plain text file that can be created with any basic text editor and sits at the root of your site (www.example.com/robots.txt). Your site doesn’t need a robots.txt for search engines to crawl it, but you will need one if you want to use the disallow directive to block access to certain pages. To do that, you’ll simply list the relevant parts of your site on the robots.txt file like this:

User-agent: *
Disallow: /path/to/your/page.html

WordPress website owners can use the All in One SEO plugin to quickly generate their own robots.txt file without the need to access the content management system’s underlying file structure directly.

How to scan your site for noindex, nofollow, and disallow instructions

Do you know for certain which parts of your website are marked as noindex and nofollow or are excluded from being indexed by a disallow rule? If you are not sure, you might consider taking an inventory and reviewing your past decisions.

One way to do such an inventory is to go to www.drlinkcheck.com, enter the URL of your site’s homepage, and hit the Start Check button.

Dr. Link Check’s primary function is to reveal broken links, but the service also provides detailed information on working links.

Report of noindex links

After the crawl of your site is complete, switch to the All Links report and create a filter to only show page links tagged as noindex:

  1. Double-click on “Filter” at the top of the report in order to turn the filter bar into text mode.
  2. Enter NoIndex = true into the text field.
  3. Press Enter to apply the filter.

Add noindex filter

You now have a custom report that shows you the pages that contain a noindex tag or have a noindex X-Robots-Tag HTTP header.

Report of nofollow links

If you want see all links that are marked as nofollow, switch to the All links report, click on Add… in the filter bar, and select Nofollow/Dofollow from the drop-down menu.

Nofollow filter

Report of disallowed links

By default, the Dr. Link Check crawler ignores all links disallowed by the rules found in the site’s robots.txt file. You can change that in the project settings:

  1. Open the Account menu at the top-right corner and select Project Settings.
  2. Click on Advanced Settings.
  3. Tick the checkbox next to Ignore robots.txt.
  4. Click Update Project to save the settings.

Ignore robots.txt setting

Now switch the Overview report and hit the Rerun check button to start a new crawl with the updated settings.

Restart crawl

After the crawl has finished, open the All Links, click on Add… in the filter section, and select robots.txt status to limit the list to links disallowed by your website’s robots.txt file.

Disallowed by robots.txt filter

Summing up

While the vast majority of website owners are far more interested in getting the search engines to notice the pages of their websites, the noindex, nofollow and disallow instructions are powerful tools to help crawlers better understand a site’s content, and they indicate which sections of the site should be hidden from search engine users.


Älterer Post