Robots.txt Generator
Define crawler access by user agent, add allow/disallow paths, and include your sitemap URL in a standards-friendly robots.txt file. Use our free robots.txt generator to create, validate, and download your file in seconds.
Inputs
Live GeneratorGenerated output
How this tool helps
What Is a Robots.txt File?
A robots.txt file is a plain text file that lives at the root of your website. It uses the Robots Exclusion Protocol to tell search engine crawlers like Googlebot, Bingbot, and others which parts of your site they can and cannot access.
Every major search engine checks for robots.txt before crawling a site. Without one, crawlers will attempt to access every URL they discover. With a properly configured robots.txt, you control how your crawl budget is spent and prevent low-value pages from clogging your index.
Robots.txt Syntax Guide
A robots.txt file uses four main directives:
- User-agent: Specifies which crawler the rules apply to. Use
*for all crawlers, or target specific bots likeGooglebot,Bingbot, orGPTBot. - Disallow: Tells the specified crawler not to access a path. Example:
Disallow: /admin/blocks the entire admin directory. - Allow: Overrides a Disallow directive for a more specific path. Example:
Allow: /admin/public/makes that subdirectory accessible even if/admin/is blocked. - Sitemap: Points crawlers to your XML sitemap for faster URL discovery. Example:
Sitemap: https://example.com/sitemap.xml.
Common Robots.txt Mistakes to Avoid
- Blocking CSS and JavaScript: Google needs access to these files to render your pages. Blocking them hurts your rankings.
- Using Disallow instead of Noindex: Disallow prevents crawling but does not prevent indexing. If other sites link to a disallowed URL, Google may still show it in search results without a snippet.
- Forgetting the trailing slash:
Disallow: /blogblocks URLs starting with /blog (including /blog-archive).Disallow: /blog/only blocks the /blog/ directory and its children. - Wrong file location: Robots.txt must be at the domain root (example.com/robots.txt). Placing it in a subdirectory has no effect.
- Blocking your entire site:
Disallow: /blocks all crawling. This is useful during development but catastrophic in production.
Robots.txt Examples by Platform
WordPress robots.txt:
User-agent: *
Disallow: /wp-admin/
Allow: /wp-admin/admin-ajax.php
Disallow: /wp-includes/
Disallow: /trackback/
Disallow: /xmlrpc.php
Sitemap: https://example.com/sitemap_index.xml Shopify robots.txt:
User-agent: *
Disallow: /admin
Disallow: /cart
Disallow: /orders
Disallow: /checkouts/
Disallow: /checkout
Disallow: /*/checkouts
Disallow: /carts
Disallow: /account
Sitemap: https://yourstore.myshopify.com/sitemap.xml Next.js / React robots.txt:
User-agent: *
Allow: /
Disallow: /api/
Disallow: /_next/
Sitemap: https://example.com/sitemap.xml Robots.txt vs Noindex vs Nofollow
| Method | Prevents Crawling | Prevents Indexing | Best For |
|---|---|---|---|
| Robots.txt Disallow | ✅ Yes | ❌ No | Saving crawl budget, blocking admin paths |
| Meta Noindex | ❌ No | ✅ Yes | Removing pages from search results |
| Meta Nofollow | ❌ No | ❌ No | Preventing PageRank from passing through links |
How to Block AI Crawlers
Many website owners now want to block AI training crawlers while keeping search engine access. Here are the key user-agent strings:
# Block OpenAI's crawler
User-agent: GPTBot
Disallow: /
# Block Google's AI training (but keep search indexing)
User-agent: Google-Extended
Disallow: /
# Block Common Crawl (used by many AI models)
User-agent: CCBot
Disallow: /
# Block Anthropic's crawler
User-agent: anthropic-ai
Disallow: / How to Test Your Robots.txt File
After generating your robots.txt file with our free robots txt generator above, validate it using these methods:
- Google Search Console: Navigate to Settings → robots.txt to test URLs against your rules and check for errors.
- Browser check: Visit
yourdomain.com/robots.txtdirectly to verify the file is accessible and correctly formatted. - Google Rich Results Test: Enter a URL to see if Googlebot can access the page or if robots.txt is blocking it.
Use this free robots.txt generator tool whenever you update your site structure, launch new sections, or need to block specific crawlers. A well-maintained robots.txt file is one of the simplest ways to improve your technical SEO.
Got questions?
What is a robots.txt file and what does it do?
A robots.txt file is a plain text file placed at the root of your website (e.g., example.com/robots.txt) that tells search engine crawlers which pages or directories they are allowed or not allowed to access. It follows the Robots Exclusion Protocol standard and is the first file crawlers check before scanning your site.
Can robots.txt block a page from appearing in Google?
Not reliably. Disallow in robots.txt prevents crawling, but Google may still index a URL if other pages link to it. The URL can appear in search results with no snippet. To fully prevent indexing, use the noindex meta tag or X-Robots-Tag HTTP header on the page itself.
Should I add my sitemap URL to robots.txt?
Yes. Adding a Sitemap directive (e.g., Sitemap: https://example.com/sitemap.xml) at the bottom of your robots.txt helps search engines discover all your important URLs faster. This is especially useful for new sites or sites with deep page hierarchies.
What is the User-agent directive in robots.txt?
The User-agent line specifies which crawler the following rules apply to. Use * (asterisk) to target all crawlers, or specify individual bots like Googlebot, Bingbot, or GPTBot. You can have multiple User-agent blocks with different rules for different crawlers.
What is crawl delay and should I use it?
Crawl-delay tells crawlers to wait a specified number of seconds between requests to reduce server load. Google ignores this directive (use Google Search Console instead), but Bing and other crawlers respect it. Only use it if your server struggles with crawl traffic.
How do I test if my robots.txt file works correctly?
Use Google Search Console's robots.txt Tester (under Settings > robots.txt) to validate your file and test specific URLs against your rules. You can also access your live file at yourdomain.com/robots.txt in any browser to visually verify the content.
What is the difference between Allow and Disallow?
Disallow tells crawlers not to access a specific path. Allow overrides a Disallow rule for a more specific path. For example, you might Disallow /private/ but Allow /private/public-page. Allow is most useful when you need exceptions within blocked directories.
Should I block CSS and JavaScript files in robots.txt?
No. Google recommends allowing access to CSS, JavaScript, and image files. Blocking these resources prevents Googlebot from rendering your page correctly, which can hurt your search rankings. Only block admin scripts or sensitive server-side resources.
Can I use robots.txt to block AI crawlers like GPTBot?
Yes. Many AI companies respect robots.txt. To block OpenAI's crawler, add User-agent: GPTBot followed by Disallow: /. Similarly, use User-agent: CCBot to block Common Crawl, or User-agent: Google-Extended to block Google's AI training crawler while still allowing regular Google search indexing.
Where should I place my robots.txt file?
Your robots.txt file must be placed at the root directory of your domain. It must be accessible at https://yourdomain.com/robots.txt. If it is in a subdirectory or returns a 404 error, search engines will assume no crawling restrictions exist and crawl everything.
Related Tools
Explore our other free AI and SEO utilities.
Automate Support & Capture Leads
with AI Agents
Start using AI agents to answer customer questions, capture leads, and support your business 24/7 — without adding more work to your team.
Free trial · Setup in 5 minutes · Cancel anytime
Questions? Talk to us.