Robots.txt: The robots.txt file tells search engines which pages or sections of your site they should or should not crawl.
Technical SEO includes optimizing this file to ensure search engines can crawl and index all of your site’s important content.
How can I write a proper robots.txt?
Below is the proper format for robots.txt file, along with an explanation of each part and its implementation:
- Example: User-agent: *
- Example: Disallow: /admin/
This where you place the folders or directories that you don’t want search engines and bots to crawl. Please note that some bots don’t abide by this but most do.
This would disallow search engine robots from crawling any URLs that include “/admin/” in the URL path.
- Allow directive: This specifies which URLs should be crawled by search engine robots. This directive is optional, and if not used, search engine robots are allowed to crawl all URLs.
- Example: Allow: /images/
This is where you list the folders or directories that you want crawled by search engines and bots.
This would allow search engine robots to crawl URLs that include “/images/” in the URL path.
- Crawl-delay directive: This specifies the number of seconds that search engine robots should wait between requests. This directive is optional, and if not used, search engine robots are allowed to crawl at their default rate.
- Example: Crawl-delay: 10
This would instruct search engine robots to wait 10 seconds between requests.
- Example: Sitemap
Sitemap directive: This specifies the location of the XML sitemap for the website. This directive is optional, and if not used, search engine robots will look for the sitemap in the default location (/sitemap.xml).
This would specify the location of the XML sitemap for the website.
A sample robots.txt file might look like this:
User-agent: * Disallow: /admin/ Allow: /images/ Crawl-delay: 10 Sitemap: https://example.com/sitemap.xml
This robots.txt file would apply to all search engine robots, disallow crawling of URLs with “/admin/” in the path, allow crawling of URLs with “/images/” in the path, instruct search engine robots to wait 10 seconds between requests, and specify the location of the XML sitemap.