Configuring robots.txt for Better Indexation and SEO Score

Complete guide to robots.txt

14 Feb 2024

Configuring robots.txt for Better Indexation and SEO Score

The robots.txt file, placed in the root directory of a website, instructs search engine robots about which pages should and should not be crawled. By preventing web crawlers from crawling certain parts of a website, we can have more control over the content visibility and improve the site’s SEO score.

Understanding the Purpose of robots.txt

The primary purpose of the robots.txt file is to prevent web crawlers from crawling specific parts of a website that are not supposed to be seen by everybody. Also, by omitting such routes, we can prevent crawlers from including irrelevant or repetitive content in search engine results. This selective approach ensures that only the valuable content is crawled, thereby boosting the overall SEO score.

For example, we may want to prevent search engines from crawling personalized pages, admin pages, or content that is still under development. By using the robots.txt file, we can pass these instructions to web crawlers.

However, it's important to note that while the robots.txt file can prevent certain pages from being crawled, it does not guarantee that search engines won't find them. The file simply provides instructions to web crawlers and does not have the power to block access to pages actively.

Importance of robots.txt

Configuring the robots.txt file correctly can have several benefits. Let's explore some of the key advantages:

1. Enhancing Crawl Budget Efficiency

Properly configured robots.txt file helps conserve the crawl budget, which is the number of pages a search engine bot is willing to crawl on a website during a given time period. By conserving the crawl budget, we can ensure that only the most relevant and valuable content is indexed, improving overall crawl efficiency.

2. Preventing Duplicate Content Issues

Duplicate content can harm a website's search engine rankings. By disallowing search engine bots from crawling repetitive or similar content, we can prevent confusion and maintain the quality and credibility of the content.

3. Securing Sensitive Information

Website security and user privacy are crucial, especially for sites with user accounts or confidential information. The robots.txt file enables us to protect sensitive or private sections of the websites by disallowing search engine bots from crawling them. But keep in mind that in some situations URLs that are disallowed in robots.txt may still be indexed, even if they haven't been crawled.

4. Providing a Clear Sitemap Reference

Another feature of robots.txt is referencing a website's XML sitemap. The XML sitemap helps search engine bots discover and follow the website's structure, leading to a more efficient and thorough crawling process. By including a reference to the sitemap in the robots.txt file, we can ensure that search engine bots can easily find and navigate the sitemap.

5. Directing Crawler Behavior for Multilingual Websites

For websites with multilingual content, using robots.txt file can help to ensure that search engine bots prioritize crawling the correct versions of the content based on user location or language preferences. This improves geo-targeting and relevance in search results, ultimately enhancing the overall user experience.

Syntax Used in robots.txt File

1. User-agent

The "User-agent" protocol identifies the specific bot or crawler to which the rule applies. For example, User-agent: Googlebot would target Google's web crawler. To target all crawlers, the rule can be specified like this: User-agent: *.

2. Disallow

The "Disallow" protocol tells bots not to crawl specific pages or sections of a website. For example, Disallow: /settings/ would block crawlers from accessing the routes in the "settings" folder. Paths must start with the "/" character and if it refers to a folder, it must end with the "/" as well.

3. Allow

The "Allow" protocol grants bots permission to crawl specific pages or sections of a website, even if they have been disallowed in a previous rule. For example, Allow: /settings/public-page.html would allow bots to access the "public-page.html" file, even if it is located in a disallowed folder.

4. Sitemap

The "Sitemap" protocol provides the location of a website's XML sitemap, helping search engine bots find pages more efficiently. Including the sitemap in the robots.txt file is considered one of the best practices for SEO. For example, Sitemap: https://www.example.com/sitemap.xml directs crawlers to the website's sitemap file. The sitemap does not even have to be on the same host as the robots.txt file. It is also possible to reference multiple XML sitemaps in robots.txt. As an example, this may be useful if a site has one static sitemap and a dynamic one.

5. Crawl-delay

The "Crawl-delay" property sets a delay between requests to avoid overloading the server. For example, Crawl-delay: 10 would request that bots wait 10 seconds between requests to the website.

Example of robots.txt

Allow: /

Disallow: /_next/*.js$

Disallow: /login
Disallow: /signup
Disallow: /forgot-password

Disallow: /admin/
Allow: /admin/exception

Disallow: /api/admin/mutations/

Sitemap: https://example.com/sitemap.xml
Sitemap: https://example.com/dynamic-sitemap.xml

NextJS and robots.txt

In Next.js, we can easily add or generate a static robots.txt file in the root of our app directory as usual. Since 13.3 Next.js also provides the convenient flexibility to dynamically generate the robots.txt file by returning a Robots object from the robots.ts file. This approach is particularly useful for generating different rules based on certain conditions (e.g. .env properties). Let's take a look at an example of generating a Robots object in Next.js:

export default function robots(): MetadataRoute.Robots {
    if (process.env.ENVIRONMENT === 'development') {
        return {
            rules: {
                userAgent: '*',
                disallow: '*',
            },
        }
    }

    return {
        rules: [
            {
                userAgent: '*',
                allow: '/',
                disallow: '/private/',
                crawlDelay: 5,
            },
        ],
        sitemap: ['https://example.com/sitemap.xml'],
    }
}

In the example above, we define a robots function. We restrict all paths for crawlers for the development environment. For other environments, we return a Robots object (or an array of objects) with the rules property containing the specific crawling directives. Additionally, we specify the location of the sitemap.xml file using the sitemap property (or an array for multiple sitemaps).

Then NextJS will automatically generate a static robots.txt file using the function we provided.

Conclusion

Configuring the robots.txt file is one of the best practices for website management and SEO. The robots.txt file provides instructions to search engine bots, guiding their crawl process.

The robots.txt file helps to secure sensitive information, enhance crawl efficiency, prevent duplicate content issues, provide a clear sitemap reference, and direct crawler behavior for multilingual or multiregional websites which increases the overall SEO score. NextJS offers increased flexibility allowing generating the robots.txt file based on certain conditions.

Remember to regularly update the robots.txt file to keep up with the changing needs of the website. With proper configuration and regular maintenance, the robots.txt file can be a powerful tool to optimize search engine rankings.

Oleg Komissarov

Senior Engineer at FocusReactive

14 Feb 2024

More posts on related topics

NextJS

Next.js SEO Best Practices: Pitfalls to Avoid in 2026

We’ve chosen several Next.js SEO challenges and solutions based on our hands-on experience, focusing on the mistakes teams make most often and the fixes that deliver real results. This SEO guide covers content rendering, metadata, site performance, and other essential strategies to help improve search rankings, website crawlability, and user experience.

Alex Hramovich

15 Mar 2026

Wordify and FocusReactive Partner to Enhance AI Search Visibility for Enterprise Brands

A new partnership between Wordify and FocusReactive brings together AI search visibility expertise and advanced headless engineering to help enterprise brands build scalable digital platforms optimized for both traditional search engines and AI-driven answer engines.

Aleksei Zhilyuk

15 Nov 2025

Beyond SEO: A Strategist's Guide to Generative Engine Optimization (GEO)

The era of "ten blue links" is over. Learn how Generative Engine Optimization (GEO) is replacing traditional SEO and discover data-backed strategies to win in the age of AI search.

Alex Hramovich

19 Jun 2025

Headless CMS

Next.js SEO Benefits and Optimization in 2025

Learn how to improve the search engine optimization of your website using NextJS, a powerful React framework that supports server-side rendering, static site generation, image optimization, and code splitting to help your site rank higher on Google and attract more traffic.

Alex Hramovich

17 Mar 2025

NextJS

Common Next.js Project Challenges in 2024

Discover essential insights into common Next.js project challenges in 2024, including rendering strategies, build types, and optimization techniques. Learn from expert audits to avoid costly mistakes and enhance your web application's performance and SEO.

Alex Hramovich

13 Mar 2024

Deep Dive into Web Performance: Mastering LCP Optimization for SEO Success

Web performance and SEO in particular can be a complex topic that can take some time and effort to master. Apart from some general tips like reducing page weight, compressing images, etc., sooner or later you may need to dig into this.

Sergey Labuts

11 Mar 2024

SEE ALL BLOG POSTS