Free AI Tool

Robots.txt Generator

Build crawler rules safely with copy-ready robots.txt output.

Define crawler access by user agent, add allow/disallow paths, and include your sitemap URL in a standards-friendly robots.txt file. Use our free robots.txt generator to create, validate, and download your file in seconds.

Inputs

Live Generator

Generated output

How this tool helps

What Is a Robots.txt File?

A robots.txt file is a plain text file that lives at the root of your website. It uses the Robots Exclusion Protocol to tell search engine crawlers like Googlebot, Bingbot, and others which parts of your site they can and cannot access.

Every major search engine checks for robots.txt before crawling a site. Without one, crawlers will attempt to access every URL they discover. With a properly configured robots.txt, you control how your crawl budget is spent and prevent low-value pages from clogging your index.

Robots.txt Syntax Guide

A robots.txt file uses four main directives:

  • User-agent: Specifies which crawler the rules apply to. Use * for all crawlers, or target specific bots like Googlebot, Bingbot, or GPTBot.
  • Disallow: Tells the specified crawler not to access a path. Example: Disallow: /admin/ blocks the entire admin directory.
  • Allow: Overrides a Disallow directive for a more specific path. Example: Allow: /admin/public/ makes that subdirectory accessible even if /admin/ is blocked.
  • Sitemap: Points crawlers to your XML sitemap for faster URL discovery. Example: Sitemap: https://example.com/sitemap.xml.

Common Robots.txt Mistakes to Avoid

  • Blocking CSS and JavaScript: Google needs access to these files to render your pages. Blocking them hurts your rankings.
  • Using Disallow instead of Noindex: Disallow prevents crawling but does not prevent indexing. If other sites link to a disallowed URL, Google may still show it in search results without a snippet.
  • Forgetting the trailing slash: Disallow: /blog blocks URLs starting with /blog (including /blog-archive). Disallow: /blog/ only blocks the /blog/ directory and its children.
  • Wrong file location: Robots.txt must be at the domain root (example.com/robots.txt). Placing it in a subdirectory has no effect.
  • Blocking your entire site: Disallow: / blocks all crawling. This is useful during development but catastrophic in production.

Robots.txt Examples by Platform

WordPress robots.txt:

User-agent: *
Disallow: /wp-admin/
Allow: /wp-admin/admin-ajax.php
Disallow: /wp-includes/
Disallow: /trackback/
Disallow: /xmlrpc.php

Sitemap: https://example.com/sitemap_index.xml

Shopify robots.txt:

User-agent: *
Disallow: /admin
Disallow: /cart
Disallow: /orders
Disallow: /checkouts/
Disallow: /checkout
Disallow: /*/checkouts
Disallow: /carts
Disallow: /account

Sitemap: https://yourstore.myshopify.com/sitemap.xml

Next.js / React robots.txt:

User-agent: *
Allow: /
Disallow: /api/
Disallow: /_next/

Sitemap: https://example.com/sitemap.xml

Robots.txt vs Noindex vs Nofollow

Method Prevents Crawling Prevents Indexing Best For
Robots.txt Disallow ✅ Yes ❌ No Saving crawl budget, blocking admin paths
Meta Noindex ❌ No ✅ Yes Removing pages from search results
Meta Nofollow ❌ No ❌ No Preventing PageRank from passing through links

How to Block AI Crawlers

Many website owners now want to block AI training crawlers while keeping search engine access. Here are the key user-agent strings:

# Block OpenAI's crawler
User-agent: GPTBot
Disallow: /

# Block Google's AI training (but keep search indexing)
User-agent: Google-Extended
Disallow: /

# Block Common Crawl (used by many AI models)
User-agent: CCBot
Disallow: /

# Block Anthropic's crawler
User-agent: anthropic-ai
Disallow: /

How to Test Your Robots.txt File

After generating your robots.txt file with our free robots txt generator above, validate it using these methods:

  1. Google Search Console: Navigate to Settings → robots.txt to test URLs against your rules and check for errors.
  2. Browser check: Visit yourdomain.com/robots.txt directly to verify the file is accessible and correctly formatted.
  3. Google Rich Results Test: Enter a URL to see if Googlebot can access the page or if robots.txt is blocking it.

Use this free robots.txt generator tool whenever you update your site structure, launch new sections, or need to block specific crawlers. A well-maintained robots.txt file is one of the simplest ways to improve your technical SEO.

Got questions?

What is a robots.txt file and what does it do?

A robots.txt file is a plain text file placed at the root of your website (e.g., example.com/robots.txt) that tells search engine crawlers which pages or directories they are allowed or not allowed to access. It follows the Robots Exclusion Protocol standard and is the first file crawlers check before scanning your site.

Can robots.txt block a page from appearing in Google?

Not reliably. Disallow in robots.txt prevents crawling, but Google may still index a URL if other pages link to it. The URL can appear in search results with no snippet. To fully prevent indexing, use the noindex meta tag or X-Robots-Tag HTTP header on the page itself.

Should I add my sitemap URL to robots.txt?

Yes. Adding a Sitemap directive (e.g., Sitemap: https://example.com/sitemap.xml) at the bottom of your robots.txt helps search engines discover all your important URLs faster. This is especially useful for new sites or sites with deep page hierarchies.

What is the User-agent directive in robots.txt?

The User-agent line specifies which crawler the following rules apply to. Use * (asterisk) to target all crawlers, or specify individual bots like Googlebot, Bingbot, or GPTBot. You can have multiple User-agent blocks with different rules for different crawlers.

What is crawl delay and should I use it?

Crawl-delay tells crawlers to wait a specified number of seconds between requests to reduce server load. Google ignores this directive (use Google Search Console instead), but Bing and other crawlers respect it. Only use it if your server struggles with crawl traffic.

How do I test if my robots.txt file works correctly?

Use Google Search Console's robots.txt Tester (under Settings > robots.txt) to validate your file and test specific URLs against your rules. You can also access your live file at yourdomain.com/robots.txt in any browser to visually verify the content.

What is the difference between Allow and Disallow?

Disallow tells crawlers not to access a specific path. Allow overrides a Disallow rule for a more specific path. For example, you might Disallow /private/ but Allow /private/public-page. Allow is most useful when you need exceptions within blocked directories.

Should I block CSS and JavaScript files in robots.txt?

No. Google recommends allowing access to CSS, JavaScript, and image files. Blocking these resources prevents Googlebot from rendering your page correctly, which can hurt your search rankings. Only block admin scripts or sensitive server-side resources.

Can I use robots.txt to block AI crawlers like GPTBot?

Yes. Many AI companies respect robots.txt. To block OpenAI's crawler, add User-agent: GPTBot followed by Disallow: /. Similarly, use User-agent: CCBot to block Common Crawl, or User-agent: Google-Extended to block Google's AI training crawler while still allowing regular Google search indexing.

Where should I place my robots.txt file?

Your robots.txt file must be placed at the root directory of your domain. It must be accessible at https://yourdomain.com/robots.txt. If it is in a subdirectory or returns a 404 error, search engines will assume no crawling restrictions exist and crawl everything.

Automate Support & Capture Leads
with AI Agents

Start using AI agents to answer customer questions, capture leads, and support your business 24/7 — without adding more work to your team.

Free trial · Setup in 5 minutes · Cancel anytime

Questions? Talk to us.