robots.txt is a file at the root of your website that tells search engine crawlers which pages or files they can or cannot request. It follows the Robots Exclusion Protocol.

How does robots.txt path matching work?

The most specific (longest) matching rule wins. Allow and Disallow rules of equal length: Allow takes precedence. Wildcards (*) match any sequence of characters. $ anchors to end of URL.

What is the difference between Allow and Disallow?

Disallow blocks a crawler from accessing a path. Allow explicitly permits access, useful to override a broader Disallow. An empty Disallow (Disallow:) means allow everything.

GPTBot is OpenAI's web crawler used to train AI models. You can block it with: User-agent: GPTBot / Disallow: /

Robots.txt Tester – Test & Validate robots.txt Free

robots.txt content

Website URL

Enter your domain — we'll fetch yourdomain.com/robots.txt automatically via our server. Works for any public website.

User-agents

Allow rules

Disallow rules

Sitemaps

Test a URL Path

User-agent

User-agent name

URL path to test

Parsed Rules by User-agent

Sitemaps Found

Frequently Asked Questions

What is robots.txt and why does it matter? ▼

robots.txt is a text file at the root of your website that tells search engine crawlers which pages they can or cannot visit. It follows the Robots Exclusion Protocol. Misconfigured robots.txt can accidentally block Googlebot from indexing your entire site — which is why testing is critical before deploying changes.

How does path matching work with wildcards? ▼

The asterisk (*) matches any sequence of characters. For example, Disallow: /*.pdf$ blocks all URLs ending in .pdf. The dollar sign ($) anchors the match to the end of the URL. The most specific (longest) matching rule wins. If an Allow and Disallow rule are equal length, Allow takes precedence.

What does an empty Disallow mean? ▼

"Disallow:" with no value means "allow everything" — it's effectively the same as no rule. This is often used as a shorthand to explicitly permit all crawling: User-agent: * / Disallow:

What is GPTBot and should I block it? ▼

GPTBot is OpenAI's crawler used to train AI models like ChatGPT. You can block it with: User-agent: GPTBot / Disallow: /. Similarly, CCBot (Common Crawl) and other AI crawlers can be blocked this way. This does not affect your Google rankings.

Does robots.txt prevent pages from appearing in Google search? ▼

Blocking a page in robots.txt prevents Googlebot from crawling it, but Google can still index the URL if other pages link to it — it just won't see the content. To fully remove a page from search results, use a noindex meta tag or the URL Removal Tool in Google Search Console.

How to Test Your Robots.txt

Paste your robots.txt content or fetch it directly from your website.
Enter a URL path to test whether it's allowed or blocked for specific user agents.
View the matching rule that determines whether the URL is crawlable.

When You Need This

Verifying that important pages aren't accidentally blocked from search engines.
Checking that private pages (admin panels, staging URLs) are properly disallowed.
Debugging crawl issues reported in Google Search Console.

Good to Know

The most common robots.txt mistake is accidentally blocking CSS and JavaScript files, which prevents Google from rendering your page properly. Never use robots.txt to hide sensitive information — it's publicly accessible and only a suggestion, not a security measure. Use noindex meta tags instead to prevent indexing.

Robots.txt Tester

Frequently Asked Questions

Related SEO Tools

How to Test Your Robots.txt

When You Need This

Good to Know