Robots.txt Generator

Build, test, and import your robots.txt — with AI crawler controls

Generate a robots.txt from scratch or import your live one, block AI training and live-browse crawlers (GPTBot, ClaudeBot, Google-Extended, PerplexityBot…), test any URL against your rules, and start from ready-made templates for WordPress, Shopify, Next.js, Webflow, Ghost, and 7 more platforms.

Start from an existing robots.txt

Edit your live file instead of rebuilding from scratch — fetch by URL, or paste it in.

We fetch {your-domain}/robots.txtserver-side, so CORS won't block it.

— or —

AI crawler policy
New

Control which AI crawlers can access your site — GPTBot (OpenAI), ClaudeBot (Anthropic), Google-Extended (Gemini), PerplexityBot, and more. These directives are separate from regular SEO rules.

Block AI training crawlers
0/11 blocked

These bots scrape pages to train models. Blocking them keeps your content out of future LLMs. They do NOT affect SEO ranking.

Block AI live-browse & search crawlers
0/10 blocked

These bots fetch pages on demand when a user asks an AI assistant a question. Blocking them means you won’t appear in AI answers or citations — usually a bad trade.

Block AI image-training crawlers
0/1 blocked

Specifically for image-focused training bots. Block if your visuals are your IP.

User-agent rules

Rule group 1

Start typing to pick from known bots, or enter a custom user-agent. Use * for all crawlers.

No exception paths.

No blocked paths — this group has no restrictions.

Honoured by Bing, Yandex, and others. Googlebot ignores this — use Search Console crawl rate settings instead.

Additional directives

Full URL to your sitemap file. Helps every search engine discover your content.

Generated robots.txt

User-agent: *
Disallow:

Test a URL against this robots.txt

See whether a given path is allowed or blocked for a specific crawler. Uses the same matching rules as Google.

Allowed · matched in * group

No matching Disallow rule in the * group — crawling is allowed.

Tip: the official Google robots.txt Tester was retired in late 2023. This tester implements the same Google matching rules (longest-pattern wins, Allow beats Disallow on ties).

Plain-English summary

A human-readable description of what your robots.txt actually does — generated live as you edit.

  • All crawlers are explicitly allowed to crawl the entire site.

Validation

Ready-Made Robots.txt Templates by Platform

Copy-paste starter templates for the most common content management systems and frameworks. Each is built from the directives that platform's default install exposes — replace example.com with your domain.

Robots.txt for WordPress

Blocks admin, includes, plugins, and the XML-RPC endpoint while letting Googlebot reach static assets.

User-agent: *
Disallow: /wp-admin/
Allow: /wp-admin/admin-ajax.php
Disallow: /wp-includes/
Disallow: /wp-content/plugins/
Disallow: /wp-content/cache/
Disallow: /wp-json/
Disallow: /xmlrpc.php
Disallow: /?s=
Disallow: /search/

Sitemap: https://example.com/sitemap_index.xml

Tip: Allow /wp-admin/admin-ajax.php so plugins that build pages via AJAX (e.g. Elementor, WPBakery) can still render correctly for crawlers.

Robots.txt for Shopify

Shopify auto-generates a robots.txt at /robots.txt. Since 2021 you can override it with robots.txt.liquid in your theme — paste this template there.

User-agent: *
Disallow: /admin
Disallow: /cart
Disallow: /orders
Disallow: /checkout
Disallow: /account
Disallow: /search
Disallow: /carts/
Disallow: /checkouts/
Disallow: /policies/
Disallow: /*?*sort_by*
Disallow: /*?*filter.*

Sitemap: https://example.com/sitemap.xml

Tip: Cart, checkout, account, and search are no-index by default. Allow the products and collections paths; never block /apps/ if you use embedded apps.

Robots.txt for Next.js

For Next.js (App Router) put this in public/robots.txt or generate it dynamically via app/robots.ts. Pages Router uses pages/api/robots.ts or public/robots.txt.

User-agent: *
Disallow: /api/
Disallow: /_next/data/
Disallow: /_next/image
Disallow: /preview/
Disallow: /draft/
Allow: /_next/static/

Sitemap: https://example.com/sitemap.xml

Tip: Block /api/, /_next/data/, and the framework preview routes. Do NOT block /_next/static — Google needs the JS/CSS to render React pages.

Robots.txt for Webflow

Set this in your Webflow project under Project Settings → SEO → Indexing. Webflow handles canonical and sitemap automatically.

User-agent: *
Disallow: /404
Disallow: /401
Disallow: /search
Disallow: /preview/
Disallow: /*?preview=

Sitemap: https://example.com/sitemap.xml

Tip: Webflow auto-generates a sitemap at /sitemap.xml. Disable indexing entirely for staging via the "Disable subdomain indexing" toggle, not robots.txt.

Robots.txt for Ghost

Ghost ships a sensible default robots.txt at /robots.txt. To customize, edit your theme’s robots.txt (Ghost themes can include one).

User-agent: *
Disallow: /ghost/
Disallow: /p/
Disallow: /email/
Disallow: /r/
Disallow: /private/

Sitemap: https://example.com/sitemap.xml

Tip: Ghost exposes /ghost/ (the admin) and /p/ (preview URLs). Both should be blocked. The sitemap lives at /sitemap.xml automatically.

Robots.txt for Hugo

Enable robots.txt generation in config.toml with enableRobotsTXT = true, then place a layouts/robots.txt template. Or drop a static file in static/robots.txt.

User-agent: *
Disallow: /admin/
Disallow: /drafts/
Disallow: /private/
Disallow: /tags/
Disallow: /categories/

Sitemap: https://example.com/sitemap.xml

Tip: Hugo’s built-in sitemap is at /sitemap.xml. Block taxonomy archive pages if they thin-content your site (e.g. low-volume tag pages).

Robots.txt for Jekyll

Add a robots.txt file at the root of your Jekyll site. Jekyll won’t process it unless you give it front-matter — for static robots.txt, omit the front-matter and Jekyll passes it through unchanged.

User-agent: *
Disallow: /admin/
Disallow: /assets/cache/
Disallow: /_site/
Disallow: /drafts/

Sitemap: https://example.com/sitemap.xml

Tip: If you use the jekyll-sitemap plugin (recommended), it auto-generates /sitemap.xml. Block /assets/cache/ and any _drafts content.

Robots.txt for Wix

Wix auto-generates robots.txt. To customize, go to SEO Tools → Robots.txt Editor in your Wix dashboard and replace the default with this template.

User-agent: *
Disallow: /_partials/
Disallow: /pro-gallery-webapp/
Disallow: /account/
Disallow: /cart
Disallow: /checkout
Disallow: /thank-you
Disallow: /forum/main/comment/

Sitemap: https://example.com/sitemap.xml

Tip: Wix exposes member pages at /account/, dynamic preview URLs at /_partials/, and editor preview routes you should always block.

Robots.txt for Squarespace

Squarespace doesn’t expose direct robots.txt editing. Use built-in SEO settings to no-index pages, and submit your sitemap (always at /sitemap.xml) to Search Console.

User-agent: *
Disallow: /api/
Disallow: /static/
Disallow: /commerce/
Disallow: /checkout
Disallow: /account
Disallow: /search
Disallow: /config

Sitemap: https://example.com/sitemap.xml

Tip: Squarespace’s default robots.txt already blocks /api/, /commerce/, /checkout, /account, /search, and config URLs. This template mirrors those defaults if you ever migrate to a host where you control robots.txt.

Robots.txt for Gatsby

Gatsby has no built-in robots.txt — install gatsby-plugin-robots-txt and configure it in gatsby-config.js, or drop a static file in static/robots.txt.

User-agent: *
Disallow: /404
Disallow: /404.html
Disallow: /preview/
Disallow: /drafts/
Allow: /static/

Sitemap: https://example.com/sitemap-index.xml

Tip: Pair with gatsby-plugin-sitemap (auto-generates /sitemap-index.xml). Make sure you don’t block /static/ — Gatsby ships hashed assets there.

Robots.txt for Astro

Place robots.txt in the public/ directory and Astro will serve it as-is. Pair with @astrojs/sitemap for automatic sitemap generation.

User-agent: *
Disallow: /api/
Disallow: /admin/
Disallow: /drafts/
Disallow: /preview/

Sitemap: https://example.com/sitemap-index.xml

Tip: Block /api/ if you use Astro endpoints for internal data. The default sitemap path is /sitemap-index.xml.

Robots.txt for Magento / Adobe Commerce

Magento generates a default robots.txt via Stores → Configuration → Catalog → Search Engine Robots. The template below covers the directives Magento doesn’t add by default.

User-agent: *
Disallow: /admin/
Disallow: /customer/
Disallow: /checkout/
Disallow: /catalog/product_compare/
Disallow: /catalog/category/view/
Disallow: /catalog/product/view/
Disallow: /sendfriend/
Disallow: /review/
Disallow: /wishlist/
Disallow: /*?SID=
Disallow: /*?dir=
Disallow: /*?limit=
Disallow: /*?order=
Disallow: /*?p=
Disallow: /*?price=

Sitemap: https://example.com/sitemap.xml

Tip: Block faceted-navigation parameters aggressively — Magento creates thousands of crawlable filter URLs that dilute crawl budget.

About Robots.txt

What is robots.txt?

A robots.txt file is a text file that website owners create to instruct web robots (typically search engine robots) how to crawl pages on their website. The robots.txt file is part of the Robots Exclusion Protocol (REP), a group of web standards that regulate how robots crawl the web, access and index content, and serve that content up to users.

Why use robots.txt?

There are several reasons why you might want to use a robots.txt file:

  • Prevent search engines from indexing private or duplicate content
  • Control which parts of your site search engines can access
  • Specify the location of your sitemap
  • Manage crawler traffic to your server
  • Prevent search engines from indexing certain files (like images or PDFs)

Important Robots.txt Directives

User-agent:

Specifies which web crawler the rules apply to. Use * to target all web crawlers.

Allow:

Specifies paths that may be crawled. This is useful to create exceptions for rules in a Disallow directive.

Disallow:

Specifies paths that should not be accessed by the crawler.

Crawl-delay:

Specifies the number of seconds between crawler requests to reduce server load.

Sitemap:

Specifies the location of your sitemap file for easier discovery by search engines.

Best Practices

  • Place the robots.txt file in the root directory of your website
  • Keep your robots.txt file simple and focused on important directives
  • Test your robots.txt file with search engine webmaster tools
  • Use robots.txt for crawling control, not for security (sensitive data should be protected by other means)
  • Remember that some web crawlers may ignore your robots.txt file

Common Use Cases

WordPress Sites

Block access to admin areas, plugins, and system files that shouldn‘t be indexed:

  • /wp-admin/ - WordPress admin dashboard
  • /wp-includes/ - WordPress core files
  • /wp-content/plugins/ - Plugin directories
  • /wp-json/ - REST API endpoints
  • /xmlrpc.php - XML-RPC file

E-commerce Sites

Prevent indexing of user-specific and transactional pages:

  • /cart/ - Shopping cart pages
  • /checkout/ - Checkout process
  • /my-account/ - User account pages
  • /search* - Search result pages
  • *?s=* - Search query parameters

Blog and Content Sites

Control access to draft content and admin areas:

  • /wp-admin/ - Admin dashboard
  • /private/ - Private content areas
  • /?s=* - Search results
  • /tag/ - Tag archive pages (if not needed)

Testing Your Robots.txt

After creating your robots.txt file, it‘s crucial to test it to ensure it works as expected:

Google Search Console

Use the robots.txt Tester tool in Google Search Console to verify your file works correctly with Googlebot.

Manual Testing

Access your robots.txt file directly at yourdomain.com/robots.txt to ensure it‘s accessible and formatted correctly.

Third-party Tools

Use online robots.txt validators to check for syntax errors and rule conflicts.

Robots.txt FAQs

Common questions about robots.txt files, crawling rules, and search engine directives.