Crawler access tester
Robots.txt Tester
Fetch and parse robots.txt, identify syntax concerns, discover sitemap directives, and test whether a path is allowed for a crawler.
Tool guide
What is the Robots.txt Tester?
The Robots.txt Tester fetches a website’s robots.txt file, parses crawler groups, lists sitemap directives, and checks whether a selected path appears allowed for a chosen user-agent. Robots.txt controls crawling behavior; it does not reliably remove an already known URL from search results.
A small syntax mistake or an overly broad Disallow rule can prevent search engines from crawling important sections. The tester helps reveal obvious access conflicts, but crawler implementations can differ, so critical rules should also be verified in the relevant search engine’s testing and indexing tools.
Audit coverage
What this SEO tool checks
Whether robots.txt can be fetched from the site root
User-agent, Allow, Disallow, Crawl-delay, and Sitemap directives
Matching rules for the selected crawler and path
Broad blocks such as Disallow: /
Malformed or suspicious directive formatting
Step-by-step
How to use the Robots.txt Tester
- 1Enter the website or robots URL
A normal website URL is converted to the root robots.txt location.
- 2Choose a crawler and path
Test Googlebot, another named bot, or the wildcard group against a specific path.
- 3Run the parser
Review the discovered groups, sitemap locations, and path decision.
- 4Confirm important pages separately
Check the final page status, robots meta tag, canonical, and Search Console coverage as well.
Interpretation
How to understand the results
- Allowed means the parsed rules do not block the selected crawler from that path.
- Blocked means a matching Disallow rule appears to prevent crawling, but it does not automatically mean the URL cannot be indexed.
- Syntax warnings should be reviewed because unsupported or misspelled directives may be ignored.
Practical advice
SEO best practices
- Keep robots.txt at the root of each protocol and hostname you want to control.
- Use noindex or access control—not robots.txt alone—when content must stay out of search results.
- Do not block CSS or JavaScript required for search engines to render important pages.
- Add absolute Sitemap directives when a sitemap exists.
- Test staging rules carefully before deploying the same file to production.
Before you act
Limitations of this automated check
The parser applies common matching logic but cannot perfectly reproduce every crawler’s implementation. A robots.txt file is publicly visible and is not a security mechanism. Temporary server errors, redirects, CDN behavior, or different files on www and non-www hosts can also affect the result.
Common questions
Robots.txt Tester FAQs
Can robots.txt remove a page from Google?
Not reliably. It blocks crawling, while removal usually requires noindex on a crawlable page, deletion, authentication, or a search-engine removal process.
Where must robots.txt be located?
At the root of the host, such as https://example.com/robots.txt. A file inside a subfolder does not control the whole site.
Does Crawl-delay work for Googlebot?
Google does not generally use the Crawl-delay directive. Other crawlers may support it differently.
Why is a blocked URL still indexed?
Search engines can discover the URL from links and index the address without crawling its content.
Continue your audit