Crawler access tester

Robots.txt Tester

Fetch and parse robots.txt, identify syntax concerns, discover sitemap directives, and test whether a path is allowed for a crawler.

Free to use Protected URL fetching Mobile friendly

Tool guide

What is the Robots.txt Tester?

The Robots.txt Tester fetches a website’s robots.txt file, parses crawler groups, lists sitemap directives, and checks whether a selected path appears allowed for a chosen user-agent. Robots.txt controls crawling behavior; it does not reliably remove an already known URL from search results.

A small syntax mistake or an overly broad Disallow rule can prevent search engines from crawling important sections. The tester helps reveal obvious access conflicts, but crawler implementations can differ, so critical rules should also be verified in the relevant search engine’s testing and indexing tools.

Audit coverage

What this SEO tool checks

Whether robots.txt can be fetched from the site root

User-agent, Allow, Disallow, Crawl-delay, and Sitemap directives

Matching rules for the selected crawler and path

Broad blocks such as Disallow: /

Malformed or suspicious directive formatting

Step-by-step

How to use the Robots.txt Tester

  1. 1
    Enter the website or robots URL

    A normal website URL is converted to the root robots.txt location.

  2. 2
    Choose a crawler and path

    Test Googlebot, another named bot, or the wildcard group against a specific path.

  3. 3
    Run the parser

    Review the discovered groups, sitemap locations, and path decision.

  4. 4
    Confirm important pages separately

    Check the final page status, robots meta tag, canonical, and Search Console coverage as well.

Interpretation

How to understand the results

  • Allowed means the parsed rules do not block the selected crawler from that path.
  • Blocked means a matching Disallow rule appears to prevent crawling, but it does not automatically mean the URL cannot be indexed.
  • Syntax warnings should be reviewed because unsupported or misspelled directives may be ignored.

Practical advice

SEO best practices

  • Keep robots.txt at the root of each protocol and hostname you want to control.
  • Use noindex or access control—not robots.txt alone—when content must stay out of search results.
  • Do not block CSS or JavaScript required for search engines to render important pages.
  • Add absolute Sitemap directives when a sitemap exists.
  • Test staging rules carefully before deploying the same file to production.

Before you act

Limitations of this automated check

The parser applies common matching logic but cannot perfectly reproduce every crawler’s implementation. A robots.txt file is publicly visible and is not a security mechanism. Temporary server errors, redirects, CDN behavior, or different files on www and non-www hosts can also affect the result.

Common questions

Robots.txt Tester FAQs

Can robots.txt remove a page from Google?

Not reliably. It blocks crawling, while removal usually requires noindex on a crawlable page, deletion, authentication, or a search-engine removal process.

Where must robots.txt be located?

At the root of the host, such as https://example.com/robots.txt. A file inside a subfolder does not control the whole site.

Does Crawl-delay work for Googlebot?

Google does not generally use the Crawl-delay directive. Other crawlers may support it differently.

Why is a blocked URL still indexed?

Search engines can discover the URL from links and index the address without crawling its content.

Continue your audit

Related SEO tools