DatoCMS Site Search > Configuration

Configuration

A bit of context about Build Triggers

The way you configure Site Search involves the concept of build triggers. Build triggers represent the connection between a DatoCMS project and a specific frontend, hosted on a particular platform (Netlify, Vercel, etc).

Since the content of a DatoCMS project can be read and used on multiple frontends, multiple build triggers can be created in a single project.

Once a build trigger is configured, it is possible to:

  1. Command the rebuild of a frontend directly from the DatoCMS interface;

  2. Activate Site Search, so that each time the frontend is rebuilt, the crawling of the site and re-indexing of its pages starts;

The configuration of the build trigger actually depends on the hosting solution you choose for the frontend, so please refer to the various guides inside our Marketplace.

Activating Site Search for a specific Build Trigger

Once you have created and properly configured a build trigger, you can activate the Site Search:

  • Go to the Project Settings > Build triggers section of your project and select a build trigger;

  • Check the Site search option and specify your Website frontend URL: that's the address from which crawling will begin;

  • Press the Publish changes button. This will start the rebuild of the frontend, and subsequently a website spidering.

Respider a website without triggering a rebuild

Anytime you want, you can also trigger a respidering of your frontend using a specific CMA endpoint.

Inspecting crawling results

Once the publishing of the website ends, in the Project Settings > Deployment > Activity log section you will see that DatoCMS will start spidering your website. When the spidering ends (it may take a while, depending on the size of your website), you'll see a Site spidering completed with success event in your log.

Clicking on the Show details link will present you the complete list of spidered pages.

How spidering works

  • The spidering starts from the URL you configure as Website frontend URL in your build trigger settings, and recursively follows all the hyperlinks pointing to your domain. If your website has a Sitemap file (sitemap.xml under the root of your domain), we'll use it as well. Sitemap Index files are also supported.

    • By default, our spider will look for sitemap.xml under the root of your domain. If you're using a sitemap index, it should also start with that filename, but the other sitemaps the index links to can be named anything you like.

    • Alternatively, you can also use a robots.txt file to specify the location of your sitemap(s), using directives like Sitemap: https://example.com/sitemap.xml or Sitemap: https://example.com/sitemap-index.xml, each on a new line

  • Through the HTML global lang attribute present on a page — or language-detection heuristics, if it's missing — we detect the language of every spidered page, so that indexing will happen with proper stemming. That is, if the visitor searches for "cats", we'll also return results for "cat", "catlike", "catty", etc.

  • The crawler does not execute JavaScript on the spidered pages, it only parses plain HTML. If your website is a Single Page App, you'll need to setup pre-rendering to make it readable by our bot. The User-Agent used by our crawler is DatoCmsSearchBot.

  • The time needed to finish the spidering operation depends on the number of pages in your website and your hosting's performances, but normally it's about ~20 indexed pages/sec;