Astro, GraphQL and DatoCMS Cache Tags: how we built the killer combo

This is the second episode of our exciting saga regarding the complete rewrite of our site in Astro. If you haven't had the chance to read the first episode, we highly recommend you do so, as it lays the groundwork and context for what we're diving into today!

Chapter 1: Why we switched to Astro (and why it might interest you)
Chapter 2: Astro, GraphQL and DatoCMS Cache Tags: how we built the killer combo
Chapter 3: Astro, Sitemaps, SEO, and Best Practices
Chapter 4: Cooking...

In this episode, we will talk about one of the most critical and fundamental topics that are essential for a DatoCMS-powered Astro website: how to effectively work with GraphQL.

A simple pattern to obtain the data

The vast majority of our site's pages primarily execute GraphQL queries to DatoCMS and convert them into Astro pages. Due to the frequency of this process, it was essential to establish the simplest possible pattern — ideally, a single function call. This is what we came up with (source code):

1
import { executeQuery } from '~/lib/datocms/executeQuery';
2
import { notFoundResponse } from '~/lib/notFoundResponse';
3
import { query } from './_graphql';
4

5
const variables = { slug: Astro.params.slug! };
6
const { productUpdate } = await executeQuery(Astro, query, { variables });
7

8
if (!productUpdate) {
9
  return notFoundResponse();
10
}
11

12
---
13

14
<Layout>
15
  {JSON.stringify(productUpdate)}
16
</Layout>

The executeQuery function does more than just executing the query and returning the result:

Checks if the domain of the request is www or www-draft, and sets the X-Include-Drafts header accordingly to return draft content or not;
Sets the X-Cache-Tags header to true to obtain the cache tags associated with the GraphQL response;
Copies the cache tags obtained from DatoCMS into the Surrogate-Key header of the Astro response (source code). An important consideration is that each Astro component involved in rendering a page may trigger a different GraphQL query, meaning the Surrogate-Key header must be a union of all cache tags returned by individual executeQuery invocations.

The global Astro carries all the necessary context for these purposes (i.e. the request and response objects), so it is the only additional argument to pass, besides the GraphQL query and its variables. Fantastic!

Strict Mode for the DatoCMS CDA

When working with TypeScript, remember to always enable Strict Mode when sending GraphQL requests to DatoCMS.

By adding the X-Exclude-Invalid header, you basically filter out invalid records from GraphQL responses, AND enable more precise and reliable TypeScript types that guarantee non-null, correctly validated data fields across your models.

(In our case, the header is set by default by our executeQuery() function)

Caching on Fastly

Let's analyze the headers that our executeQuery() sets on Astro's responses. If the domain of the request is www, the headers will be:

Surrogate-Key: <LIST OF CACHE TAGS>
Surrogate-Control: max-age=31536000, stale-while-revalidate=60, stale-if-error=86400

The Surrogate-Control header instructs Fastly to cache the response for one year. In addition to the classic max-age, we also use the stale-while-revalidate and stale-if-error directives, which are very powerful:

stale-while-revalidate=60 means that for up to one minute after the cache expires, Fastly can immediately serve the stale content while asynchronously fetching a fresh version in the background. This means that visitors do not pay the price of a slower response even in the case of invalidated cache.
The stale-if-error=86400 allows Fastly to continue serving the cached content for up to 24 hours if the origin server is unavailable or returns an error, ensuring better availability and user experience during potential server issues.

On the www-draft domain, we do not have Fastly in front, so we are simply concerned with exposing the cache tags for debugging purposes, and instructing browsers to never cache the server responses — we always want the latest data:

Debug-Surrogate-Key: <LIST OF CACHE TAGS>
Cache-control: private

Invalidating Fastly's cache

Since our pages will be cached by Fastly for a year, it’s time to focus on invalidating them!

I absolutely love this part every time I work with DatoCMS: setting up a webhook from the interface, writing just 20 lines of code, and accomplishing what previously would have taken weeks of optimization and debugging to invalidate pages correctly! 😅

1
import type { APIRoute } from 'astro';
2
import { json } from '../_utils';
3
import { FASTLY_KEY, FASTLY_SERVICE_ID } from 'astro:env/server';
4
import ky from 'ky';
5

6
export const POST: APIRoute = async ({ request }) => {
7
  const data = await request.json();
8

9
  // DatoCMS sends us the tags to be invalidated via webhook
10
  const cacheTags = data.entity.attributes.tags;
11

12
  const response = await ky.post(`https://api.fastly.com/service/${FASTLY_SERVICE_ID}/purge`, {
13
    headers: {
14
      'fastly-key': FASTLY_KEY,
15
      // Required for stale-while-revalidate to work!
16
      'fastly-soft-purge': '1',
17
      'content-type': 'application/json',
18
    },
19
    json: { surrogate_keys: keys },
20
  }).json();
21

22
  return json({ cacheTags, response });
23
};

Man, I'm so proud of our Cache Tags. 🥰

GraphQL and TypeScript

One of the objectives of the rewrite was to achieve complete TypeScript coverage. For this purpose, we chose to use gql.tada, an incredible library capable of deriving the types for your GraphQL queries on the fly.

We have chosen to organize our routes this way and to declare the GraphQL queries in a _graphql.ts file:

Files with the _ prefix won’t be recognized by the Astro router

The following is an example of a query. Once passed as an argument to our executeQuery(), the result will be fully typed!

1
import { ProductUpdateFragment } from '~/components/product-updates/ProductUpdate/graphql';
2
import { TagFragment } from '~/lib/datocms/commonFragments';
3
import { graphql } from '~/lib/datocms/graphql';
4

5
export const query = graphql(
6
  /* GraphQL */ `
7
    query ProductUpdate($slug: String!) {
8
      productUpdate: changelogEntry(filter: { slug: { eq: $slug } }) {
9
        _seoMetaTags {
10
          ...TagFragment
11
        }
12
        ...ProductUpdateFragment
13
      }
14
    }
15
  `,
16
  [TagFragment, ProductUpdateFragment],
17
);

Fragment composition

One of the less talked about strengths of GraphQL, when working in "componentized frameworks" like Astro, React, Vue or Svelte, lies in fragment composition and hierarchical schema design. The gql.tada documentation does a great job of explaining this pattern and its benefits.

Basically, each component defines its own data requirements using a GraphQL fragment stored in a graphql.ts file next to the component itself:

The directory structure of every component in the project

When a parent component includes child components, it aggregates their GraphQL fragments. Here's how the process unfolds for a <Parent /> component:

The parent imports both the child component (e.g., <QuestionAnswer />) and its corresponding GraphQL fragment (e.g. QuestionAnswerFragment).
It then creates its own fragment by declaring its specific data requirements and incorporating the fragments of its child components.

This fragment composition continues up the component tree until it reaches the top-level Astro page, which will then execute a single, comprehensive combined macro-query.

It's an incredibly powerful pattern: it creates a modular, clean, maintainable way of managing component data requirements in a type-safe, composable manner, enforcing the Single Responsibility Principle and Separation of Concerns. Regardless of how many pages use your component, future changes will require modifications in only one place in your code.

If there's only one takeaway from this article, it's this: use fragment composition!

Simplifying GraphQL Pagination

Sometimes website sections require displaying a large volume of content. A perfect example is our Wall of Love ♥️ , where the goal is to create a wow effect by showcasing numerous quotes, without necessarily expecting visitors to read each one carefully. In such cases, pagination becomes crucial for retrieving entire collections of records.

GraphQL pagination is notoriously challenging — it's rarely as straightforward as developers would like, and especially its logic, unlike REST APIs, is much more difficult to extract and reuse across different parts of your application.

However, it's not impossible! Recently, we've added an ingenious method to our GraphQL client @datocms/cda-client that simplifies this entire process called executeQueryWithAutoPagination:

import { executeQueryWithAutoPagination } from "@datocms/cda-client";

const { allQuotes } = await executeQueryWithAutoPagination(`
  query WallOfLove {
    allQuotes(first: 5000) { author quote }
  }
`);

What should jump out at you from this piece of code is that we're fetching 5,000 records in a single call — which seems impossible, given that at the moment, pages with DatoCMS are limited to 100 elements!

Well, executeQueryWithAutoPagination automatically analyzes the query and dynamically rewrites it behind the scenes, like this:

query WallOfLove {
  splitted_0_allQuotes(first: 100, skip: 0) { author quote }
  splitted_1_allQuotes(first: 100, skip: 100) { author quote }
  splitted_2_allQuotes(first: 100, skip: 200) { author quote }
  # ... and so on
}

Once executed, the results are seamlessly collected and recomposed, making the entire pagination process transparent to the developer, and without requiring multiple calls. It's like magic! ✨

Wrap-up

We hope that sharing our site's data fetching strategies gives you some inspiration for a modern, efficient approach to web development, even if you're using different frameworks than Astro:

we’ve implemented robust content delivery with Fastly caching, and at the same time streamlined route query complexity to a single function call, so that no one can make unintentional errors;
we've built a modular and mantainable GraphQL design with component-level fragments and bottom-up fragment composition.
everything is supported by TypeScript, so it becomes immediately clear if any GraphQL query is incorrect (or becomes incorrect in the future due to a schema change).

Stay tuned for the next episode of our saga, where we'll dive deep into other areas of our website! 👋🏼

The DatoCMS Blog