Importing data from external sources

    As a DatoCMS developer you often find yourself in need of importing data from an external source. For example when you are doing a one-time import from another CMS over to DatoCMS, or when you just want to clean up messy data from an external API or restful web service, or if you want the ability to do powerful queries on it.

    In this guide we will cover how to do a one-time import from an external data source using Node.JS.

    Concepts you should have basic familiarity with Node.js and async/await.

    What are some common external sources? An external datasource can be a wide range of different formats made available on a wide range of different transport layers. Here's a few examples:

    • The rest API of your old CMS
    • A text file with comma separated values (CSV)
    • A SQL database
    • A JSON file or newline delimited JSON (NDJSON) file

    The anatomy of an external data import

    No matter what kind of source you are reading from, an external import can be split into three discrete steps:

    1. Read data from external source
    2. Transform to DatoCMS records(s) matching your data model
    3. Save the records to your DatoCMS project

    We will cover each of these in order

    Step 1. Read data from external source

    Let's start with a simple example where the external data source is an API endpoint containing an array of breeds of dogs that we want to import into a DatoCMS project.

    [
    {
    "id": 1,
    "breed": "Alapaha Blue Blood Bulldog",
    "bred_for": "Guarding",
    "category": "Mixed",
    "description": "The Alapaha Blue Blood Bulldog is a well-developed, exaggerated bulldog with a broad head and...",
    "life_span": "12 - 13 years",
    "image_url": "https://cdn2.thedogapi.com/images/kuvpGHCzm.jpg"
    },
    {
    "id": 2,
    "breed": "Alaskan Husky",
    "bred_for": "Sled pulling",
    "category": "Mixed",
    "life_span": "10 - 13 years",
    "image_url": "https://cdn2.thedogapi.com/images/uEPB98jBS.jpg"
    },
    {
    "id": 3,
    "breed": "Alaskan Malamute",
    "bred_for": "Hauling heavy freight, Sled pulling",
    "category": "Working",
    "life_span": "12 - 15 years",
    "image_url": "https://cdn2.thedogapi.com/images/aREFAmi5H.jpg"
    },
    ...
    ]

    The quickest way to read from this API in Node.js is to install the node-fetch package which gives you a window.fetch-like API that enables you to fetch the data with.

    const fetch = require('node-fetch');
    async function importDogBreeds() {
    const response = await fetch('https://something.now.sh/dog-breeds');
    const dogBreeds = await response.json();
    // we now have an array of dogBreeds from the external API
    }
    importDogBreeds();

    Step 2: Transform to DatoCMS record(s) matching your data model

    Now, let's say the following is the DatoCMS schema we want our imported data to adhere to:

    Model "Category"
    • ID: 552
    • API key: category
    • Fields:
      • Name (API key: name): string
    Model "Dog breed"
    • ID: 730
    • API key: dog_breed
    • Model fields:
      • Name (API key: name): string
      • Category (API key: category): link to model category
      • Breed for (API key: breed_for): string
      • Description (API key: description): text
      • Image (API key: image): file

    If you look carefully, you'll see that the source data doesn't map 1:1 to the schema model. There's a few differences to note here:

    1. The breed field is called name in our DatoCMS model
    2. Instead of importing category directly as text inside the breed, we want to create a separate record for them, and have the category field be a reference to it instead;
    3. The life_span field from the external API isn't relevant to us, and we don't want to import it at all;

    This can roughly be codified to the following transform function:

    function transformDogBreed(externalData) {
    return {
    itemType: '730', // <- that's the ID of our dog_breed model
    name: externalData.breed,
    category: ???,
    breed_for: externalData.breed_for,
    description: externalData.description,
    image: ???,
    };
    }

    As you might have guessed, itemType means "model" in API-land, and you have to fill it in with the ID of your model (in this case, "730").

    The category field requires a category record ID, but right now we do not have it. This suggests us that first we have to import the breed categories, and then we can proceed importing the dog breeds.

    To do that, we get all the different dog breed categories, and then we remove any duplicate:

    const uniq = require('lodash.uniq');
    const fetch = require('node-fetch');
    async function importDogBreeds() {
    const response = await fetch('https://something.now.sh/dog-breeds');
    const dogBreeds = await response.json();
    const categories = dogBreeds.map(dogBreed => dogBreed.category)
    const uniqueCategories = uniq(categories);
    }

    Step 3: Importing to DatoCMS

    In the previous steps all we did was fetch and prepare the data to be imported into your DatoCMS project. Now it's time to actually make it become DatoCMS records.

    First we need to configure our DatoCMS client with our project's API token. We will need to add datocms-client as a dependency to our project and create a client instance:

    const { SiteClient } = require('datocms-client');
    const client = new SiteClient('<YOUR-TOKEN-WITH-WRITE-ACCESS>');

    In order to give this client write access, we need to generate an access token. You can generate an access token under the "API token" section of your project's settings.

    Now that we have our client configured, the next step is to create our records, using the client.items.create method:

    const categoryNameToRecord = {};
    for (let categoryName of uniqueCategories) {
    categoryNameToRecord[name] = await client.items.create({
    itemType: '552', // <- that's the ID of our category model
    name
    });
    }

    As you can see, we save the created records in a categoryNameToRecord object so that it will be easier to access them during the creation of dog breeds, which is obviously the next thing we need to to do in our script:

    for (let dogBreed of dogBreeds) {
    categoryNameToRecord[name] = await client.items.create({
    itemType: '730', // <- that's the ID of our dog_breed model
    name: externalData.breed,
    category: categoryNameToRecord[dogBreed.category].id, // <- we pick the ID of our category record
    breed_for: externalData.breed_for,
    description: externalData.description,
    image: ???,
    });
    }

    The last step is uploading the images. To do that, we can simply use the client.uploadFile method, passing down additional data such as the default alternate text we want for each image:

    for (let dogBreed of dogBreeds) {
    const image = await client.uploadFile(
    dogBreed.image_url,
    {
    defaultFieldMetadata: {
    en: {
    alt: `${dogBreed} dog`
    }
    },
    notes: `Imported from external source`,
    }
    );
    categoryNameToRecord[name] = await client.items.create({
    // ...
    image: image,
    });
    }

    And voilĂ ! You've just successfully imported your external data to DatoCMS! Here's the complete script for reference:

    const uniq = require('lodash.uniq');
    const fetch = require('node-fetch');
    const { SiteClient } = require('datocms-client');
    const client = new SiteClient('<YOUR-TOKEN-WITH-WRITE-ACCESS>');
    const data = [
    {
    "id": 1,
    "breed": "Alapaha Blue Blood Bulldog",
    "bred_for": "Guarding",
    "category": "Mixed",
    "description": "The Alapaha Blue Blood Bulldog is a well-developed, exaggerated bulldog with a broad head and...",
    "life_span": "12 - 13 years",
    "image_url": "https://cdn2.thedogapi.com/images/kuvpGHCzm.jpg"
    },
    {
    "id": 2,
    "breed": "Alaskan Husky",
    "bred_for": "Sled pulling",
    "category": "Mixed",
    "life_span": "10 - 13 years",
    "image_url": "https://cdn2.thedogapi.com/images/uEPB98jBS.jpg"
    },
    {
    "id": 3,
    "breed": "Alaskan Malamute",
    "bred_for": "Hauling heavy freight, Sled pulling",
    "category": "Working",
    "life_span": "12 - 15 years",
    "image_url": "https://cdn2.thedogapi.com/images/aREFAmi5H.jpg"
    }
    ];
    async function importDogBreeds() {
    const categories = data.map(dogBreed => dogBreed.category)
    const uniqueCategories = uniq(categories);
    const categoryNameToRecord = {};
    for (let categoryName of uniqueCategories) {
    categoryNameToRecord[name] = await client.items.create({
    itemType: '<CATEGORY-MODEL-ID>',
    name
    });
    }
    for (let dogBreed of dogBreeds) {
    const image = await client.uploadFile(
    dogBreed.image_url,
    {
    defaultFieldMetadata: {
    en: {
    alt: `${dogBreed} dog`
    }
    },
    notes: `Imported from external source`,
    }
    );
    categoryNameToRecord[name] = await client.items.create({
    itemType: '<DOG-BREED-MODEL-ID>',
    name: externalData.breed,
    category: categoryNameToRecord[dogBreed.category].id,
    breed_for: externalData.breed_for,
    description: externalData.description,
    image,
    });
    }
    }
    importDogBreeds();