Robots.txt: Telling Crawlers Where to Go

Robots.txt: Telling Crawlers Where to Go

The robots.txt file sits at the root of your domain and gives polite instructions to search-engine bots about which areas they may crawl. It is small, plain text and surprisingly powerful — a single careless line can hide your entire site from Google.

Used well, it keeps crawlers focused on the pages that matter and away from the ones that do not, conserving the crawl budget that larger sites depend on.

What It Can and Cannot Do

It is important to understand its limits, because it is often misused as a security or privacy tool when it is neither.

  • It can ask bots not to crawl certain folders, such as admin or internal search pages.
  • It cannot reliably keep a page out of the index — use a noindex tag for that.
  • It does not protect private data; it is only a request, not a lock on the door.

Common Mistakes

The most damaging error we see is a leftover Disallow: / from a staging site that blocks the entire domain after launch. We always check this on go-live and review it during audits, because it can silently wipe out a site's visibility within days.

Linking to Your Sitemap

A small but valuable habit is to add a line pointing to your XML sitemap inside the robots file. It gives every crawler an immediate route to your full list of important pages, reinforcing discovery alongside your Search Console submission.

If you need a hand with any of this, your Progressive Robot delivery team is ready to help. Raise a ticket from the Support area of your client portal or speak to your account manager and we will guide you through the next steps.

Did you find this article useful?