Robots.txt Generator
Control Search Engine Crawling with Robots.txt
Create a custom robots.txt file to guide search engine crawlers. Specify which pages to allow or block, set crawl delays, and include your sitemap URL for better SEO control.
User-Agent Rules
Use * for all bots or specify a bot name like Googlebot
Time in seconds between crawler requests (optional)
Additional Directives
Full URL to your sitemap file
Preferred domain for indexing (mainly for Yandex)
Generated robots.txt
User-agent: *
Robots.txt Validation
After creating your robots.txt file, we recommend testing it with:
Example Robots.txt Patterns
Block all robots from the entire website
Prevents all search engines from crawling any part of your site
Robots.txt Content:
User-agent: * Disallow: /
Allow all robots complete access
Allows all search engines to crawl your entire website
Robots.txt Content:
User-agent: * Allow: /
Block a specific folder
Prevents crawling of a specific directory while allowing access to the rest
Robots.txt Content:
User-agent: * Disallow: /private-folder/
WordPress Common Rules
Standard robots.txt for WordPress sites to block admin and system files
Robots.txt Content:
User-agent: * Disallow: /wp-admin/ Disallow: /wp-includes/ Disallow: /wp-content/plugins/ Disallow: /wp-json/ Disallow: /xmlrpc.php Sitemap: https://example.com/sitemap.xml
E-commerce Site Rules
Typical rules for online stores to prevent indexing of cart and checkout pages
Robots.txt Content:
User-agent: * Disallow: /cart/ Disallow: /checkout/ Disallow: /my-account/ Disallow: /search* Disallow: *?s=* Sitemap: https://example.com/sitemap.xml
Different Rules for Different Bots
Specific rules for Googlebot and general rules for all other bots
Robots.txt Content:
User-agent: Googlebot Crawl-delay: 10 Disallow: /private/ User-agent: * Disallow: /admin/ Disallow: /private/ Sitemap: https://example.com/sitemap.xml
Block Specific File Types
Prevent indexing of specific file types like PDFs and images
Robots.txt Content:
User-agent: * Disallow: /*.pdf$ Disallow: /*.jpg$ Disallow: /*.png$ Disallow: /*.gif$ Sitemap: https://example.com/sitemap.xml
Allow Specific Paths with Exceptions
Block a directory but allow access to specific subdirectories
Robots.txt Content:
User-agent: * Disallow: /private/ Allow: /private/public-content/ Sitemap: https://example.com/sitemap.xml
About Robots.txt
What is robots.txt?
A robots.txt file is a text file that website owners create to instruct web robots (typically search engine robots) how to crawl pages on their website. The robots.txt file is part of the Robots Exclusion Protocol (REP), a group of web standards that regulate how robots crawl the web, access and index content, and serve that content up to users.
Why use robots.txt?
There are several reasons why you might want to use a robots.txt file:
- Prevent search engines from indexing private or duplicate content
- Control which parts of your site search engines can access
- Specify the location of your sitemap
- Manage crawler traffic to your server
- Prevent search engines from indexing certain files (like images or PDFs)
Important Robots.txt Directives
User-agent:
Specifies which web crawler the rules apply to. Use * to target all web crawlers.
Allow:
Specifies paths that may be crawled. This is useful to create exceptions for rules in a Disallow directive.
Disallow:
Specifies paths that should not be accessed by the crawler.
Crawl-delay:
Specifies the number of seconds between crawler requests to reduce server load.
Sitemap:
Specifies the location of your sitemap file for easier discovery by search engines.
Best Practices
- Place the robots.txt file in the root directory of your website
- Keep your robots.txt file simple and focused on important directives
- Test your robots.txt file with search engine webmaster tools
- Use robots.txt for crawling control, not for security (sensitive data should be protected by other means)
- Remember that some web crawlers may ignore your robots.txt file
Common Use Cases
WordPress Sites
Block access to admin areas, plugins, and system files that shouldn‘t be indexed:
- /wp-admin/ - WordPress admin dashboard
- /wp-includes/ - WordPress core files
- /wp-content/plugins/ - Plugin directories
- /wp-json/ - REST API endpoints
- /xmlrpc.php - XML-RPC file
E-commerce Sites
Prevent indexing of user-specific and transactional pages:
- /cart/ - Shopping cart pages
- /checkout/ - Checkout process
- /my-account/ - User account pages
- /search* - Search result pages
- *?s=* - Search query parameters
Blog and Content Sites
Control access to draft content and admin areas:
- /wp-admin/ - Admin dashboard
- /private/ - Private content areas
- /?s=* - Search results
- /tag/ - Tag archive pages (if not needed)
Testing Your Robots.txt
After creating your robots.txt file, it‘s crucial to test it to ensure it works as expected:
Google Search Console
Use the robots.txt Tester tool in Google Search Console to verify your file works correctly with Googlebot.
Manual Testing
Access your robots.txt file directly at yourdomain.com/robots.txt to ensure it‘s accessible and formatted correctly.
Third-party Tools
Use online robots.txt validators to check for syntax errors and rule conflicts.