Robots.txt Generator

Control Search Engine Crawling with Robots.txt

Create a custom robots.txt file to guide search engine crawlers. Specify which pages to allow or block, set crawl delays, and include your sitemap URL for better SEO control.

User-Agent Rules

Rule Group 1

User-Agent

Use * for all bots or specify a bot name like Googlebot

Allow Paths

Disallow Paths

Crawl-delay (seconds)

Time in seconds between crawler requests (optional)

Additional Directives

Sitemap URL

Full URL to your sitemap file

Host Directive

Preferred domain for indexing (mainly for Yandex)

Generated robots.txt

User-agent: *

Robots.txt Validation

After creating your robots.txt file, we recommend testing it with:

Example Robots.txt Patterns

Block all robots from the entire website

Prevents all search engines from crawling any part of your site

Robots.txt Content:

User-agent: *
Disallow: /

Allow all robots complete access

Allows all search engines to crawl your entire website

Robots.txt Content:

User-agent: *
Allow: /

Block a specific folder

Prevents crawling of a specific directory while allowing access to the rest

Robots.txt Content:

User-agent: *
Disallow: /private-folder/

WordPress Common Rules

Standard robots.txt for WordPress sites to block admin and system files

Robots.txt Content:

User-agent: *
Disallow: /wp-admin/
Disallow: /wp-includes/
Disallow: /wp-content/plugins/
Disallow: /wp-json/
Disallow: /xmlrpc.php

Sitemap: https://example.com/sitemap.xml

E-commerce Site Rules

Typical rules for online stores to prevent indexing of cart and checkout pages

Robots.txt Content:

User-agent: *
Disallow: /cart/
Disallow: /checkout/
Disallow: /my-account/
Disallow: /search*
Disallow: *?s=*

Sitemap: https://example.com/sitemap.xml

Different Rules for Different Bots

Specific rules for Googlebot and general rules for all other bots

Robots.txt Content:

User-agent: Googlebot
Crawl-delay: 10
Disallow: /private/

User-agent: *
Disallow: /admin/
Disallow: /private/

Sitemap: https://example.com/sitemap.xml

Block Specific File Types

Prevent indexing of specific file types like PDFs and images

Robots.txt Content:

User-agent: *
Disallow: /*.pdf$
Disallow: /*.jpg$
Disallow: /*.png$
Disallow: /*.gif$

Sitemap: https://example.com/sitemap.xml

Allow Specific Paths with Exceptions

Block a directory but allow access to specific subdirectories

Robots.txt Content:

User-agent: *
Disallow: /private/
Allow: /private/public-content/

Sitemap: https://example.com/sitemap.xml

About Robots.txt

What is robots.txt?

A robots.txt file is a text file that website owners create to instruct web robots (typically search engine robots) how to crawl pages on their website. The robots.txt file is part of the Robots Exclusion Protocol (REP), a group of web standards that regulate how robots crawl the web, access and index content, and serve that content up to users.

Why use robots.txt?

There are several reasons why you might want to use a robots.txt file:

Prevent search engines from indexing private or duplicate content
Control which parts of your site search engines can access
Specify the location of your sitemap
Manage crawler traffic to your server
Prevent search engines from indexing certain files (like images or PDFs)

Important Robots.txt Directives

User-agent:

Specifies which web crawler the rules apply to. Use * to target all web crawlers.

Allow:

Specifies paths that may be crawled. This is useful to create exceptions for rules in a Disallow directive.

Disallow:

Specifies paths that should not be accessed by the crawler.

Crawl-delay:

Specifies the number of seconds between crawler requests to reduce server load.

Sitemap:

Specifies the location of your sitemap file for easier discovery by search engines.

Best Practices

Place the robots.txt file in the root directory of your website
Keep your robots.txt file simple and focused on important directives
Test your robots.txt file with search engine webmaster tools
Use robots.txt for crawling control, not for security (sensitive data should be protected by other means)
Remember that some web crawlers may ignore your robots.txt file

Common Use Cases

WordPress Sites

Block access to admin areas, plugins, and system files that shouldn‘t be indexed:

/wp-admin/ - WordPress admin dashboard
/wp-includes/ - WordPress core files
/wp-content/plugins/ - Plugin directories
/wp-json/ - REST API endpoints
/xmlrpc.php - XML-RPC file

E-commerce Sites

Prevent indexing of user-specific and transactional pages:

/cart/ - Shopping cart pages
/checkout/ - Checkout process
/my-account/ - User account pages
/search* - Search result pages
*?s=* - Search query parameters

Blog and Content Sites

Control access to draft content and admin areas:

/wp-admin/ - Admin dashboard
/private/ - Private content areas
/?s=* - Search results
/tag/ - Tag archive pages (if not needed)

Testing Your Robots.txt

After creating your robots.txt file, it‘s crucial to test it to ensure it works as expected:

Google Search Console

Use the robots.txt Tester tool in Google Search Console to verify your file works correctly with Googlebot.

Manual Testing

Access your robots.txt file directly at yourdomain.com/robots.txt to ensure it‘s accessible and formatted correctly.

Third-party Tools

Use online robots.txt validators to check for syntax errors and rule conflicts.