Skip to content

Instantly share code, notes, and snippets.

@thinkjrs
Last active October 20, 2020 02:10
Show Gist options
  • Select an option

  • Save thinkjrs/2ec9f6cc89b4792bd6085b7e5e5668d8 to your computer and use it in GitHub Desktop.

Select an option

Save thinkjrs/2ec9f6cc89b4792bd6085b7e5e5668d8 to your computer and use it in GitHub Desktop.

Creating Your Next.js Sitemap & robots.txt

This simple tutorial demonstrates how to create a static sitemap for Next.js applications.

We generally follow Lee Robinson's excellent post on the topic. Thanks Lee!

So what is a sitemap?

Very simply, a sitemap is a simple file providing search engines instructions to accurately (hopefully) index your site. This simple file functions similarly to a map, but in a format Google and other search engines have agreed to follow with their crawlers/indexers.

Check out a deeper dive in Wikipedia for details on why and what.

Let's go

So let's install dependencies, add a basic scripts/ directory, a generateSitemap.js sitemap builder and update the next.config.js configuration file to get things working.

Install globby development dependency

Let's install a simple dependency for easy globbing of our routes list:

npm install -D globby # -D ~ --save-dev

Create generateSitemap.js

Now, create a scripts/ directory in the root of your project and create your sitemap generator file:

mkdir scripts && touch scripts/generateSitemap.js

Now it's time to pop into an editor of your choice:

/* generateSitemap.js Mostly the work of Lee Robinson*/

const fs = require('fs');

const globby = require('globby');
const prettier = require('prettier');

(async () => {
  const prettierConfig = await prettier.resolveConfig('./.prettierrc.js');

  // Ignore Next.js specific files (e.g., _app.js) and API routes.
  const pages = await globby([
    'pages/**/*{.js,.mdx, .ts, .tsx}',
    '!pages/_*.js',
    '!pages/api'
  ]);
  const sitemap = `
        <?xml version="1.0" encoding="UTF-8"?>
        <urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
            ${pages
              .map((page) => {
                const path = page
                  .replace('pages', '')
                  .replace('.js', '')
                  .replace('.mdx', '')
                  .replace('.ts', '')
                  .replace('.tsx', '');
                const route = path === '/index' ? '' : path;

                return `
                        <url>
                            <loc>${`https://<yoursitehere.tld>${route}`}</loc>
                        </url>
                    `;
              })
              .join('')}
        </urlset>
    `;

  // If you're not using Prettier, you can remove this.
  const formatted = prettier.format(sitemap, {
    ...prettierConfig,
    parser: 'html'
  });

  fs.writeFileSync('public/sitemap.xml', formatted);
})();

Notice this line <loc>${https://<yoursitehere.tld>${route}}</loc>: you need to remove the carets around yoursitehere.tld and replace with your site, e.g. <loc>${ https://musicfox.io${route} }</loc>!

Override or append to your next.config.js

Now you need to create or add to your next.config.js, which tells webpack to use your sitemap generation script, generateSitemap.js, when building on the server.

/* next.config.js
 * _Note: The `sassOptions` below are superfluous if you are not using or do not use sass!_
 */
const path = require('path')

module.exports = {
  sassOptions: {
    includePaths: [path.join(__dirname, 'styles')],
  },
  webpack: (config, { isServer }) => { // add for your sitemap generation
    if (isServer) {
      require('./scripts/generateSitemap');
    }

    return config;
  }
};

See Lee Robinson's article, linked above, for dynamic routing instructions. Other than that, you're done! Google, Bing, and everyone else will be able to properly crawl your site.

And let's be real, we know why you did this: your lighthouse score is about to improve, forthwith. ;-)

Oh wait, nothing happened in lighthouse!

Let me guess, you don't have a robots.txt or similar in your project. If you don't, the x-robots-tag header is automatically assigned noindex, telling crawlers who care about rules to avoid your page.

That's probably not what you want so let's create a robots.txt in your public/ directory:

# robots.txt
User-agent: *
Allow: /

Sitemap: https://musicfox.io/sitemap.xml

Note: If you use multiple builds, e.g. production, canary, feature, etc., on a modern platform you will likely see something about a header, x-robots-tag: noindex. This is likely your platform attempting to prevent your preview/canary/feature deployments from being indexed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment