Lessons from WP to Statamic imports

Published in WordPress, Statamic, Craft, on Aug 18, 2025

There is a quote "There are two hard problems in computer science: cache invalidation, naming things, and off-by-one errors." But I'd like to add one to that list: imports. Imports are hard, and getting them right is even harder. To help my brain and possibly yours I am documenting everything I went through to migrate thousands of posts, pages, and more to Statamic.

Step one: what kind of CMS (if any) are we migrating coming from? I will break export portion into CMS specific parts, and then the import process I'll step through using WordPress as the source.

I will note from the onset that much of the following is entirely unnecessary if you can: easily get a CSV/XML export of your content and the content structure is relatively simple. If those conditions apply to your use case, give the official importer addon a try: https://statamic.com/addons/statamic/importer It does offer hooks for customization and an easy UI for mapping fields. That being said, I like doing things manually to have the most control over the data flow.

The Export

WordPress

To get data out of WordPress you could go down the route of creating a custom export with all the data you want from the custom post types, ACF fields, etc. But that is slow work and requires extra effort to repeat if something goes wrong. Instead I prefer to enable the WordPress REST API and then add a custom REST field with all the data I need for the sync.

If you are exporting/importing a custom post type you first need to enable the REST API for that type. Find where you are calling the register_post_type function and add the show_in_rest attribute set to true

<?php
register_post_type(
    'property',
    [
        //...
        'show_in_rest' => true,
        //...
    ],
);

Then you can add a custom REST field with all the sync data:

<?php
add_action('rest_api_init', function () {
    register_rest_field(
        'post',
        'sync',
        ['get_callback' => 'rest_field_post_sync']
    );
});

function rest_field_post_sync($object, $field_name, $request) {
    $data = get_post($object['id']);
    $tags = wp_get_post_terms($data->ID, 'post_tag');
    $categories = wp_get_post_terms($data->ID, 'category');

    return [
        'id' => $data->ID,
        'date' => $data->post_date,
        'status' => $data->post_status,
        'title' => $data->post_title,
        'slug' => $data->post_name,
        'content' => $data->post_content,
        'excerpt' => $data->post_excerpt,
        'tags' => array_reduce($tags, function ($carry, $tag) {
            $carry[$tag->slug] = $tag->name;
            return $carry;
        }, []),
        'categories' => array_reduce($categories, function ($carry, $category) {
            $carry[$category->slug] = $category->name;
            return $carry;
        }, []),
        'featured_image' => get_the_post_thumbnail_url($data->ID, 'original'),
    ];
}

This is pretty straight forward code. The register_rest_field function takes in a content type (ie. user, post, or a custom post type), a field name - in this case "sync" and an array of options which we only pass the get_callback property a value of the function name we want to call to generate that field. From there, we need to do some WordPress data gymnastics to message the data into the format we want; I highly recommend doing more work on the WordPress side to make this export extra clean as it will make the import 10x easier.

Craft

Here I am migrating from Craft 2.7.5, but the concepts are adaptable to any Craft version. Craft does not have an easy way to create a JSON endpoint like WP out of the box, so instead we will install the excellent Element API plugin. This plugin allows us to define a new config file under the path craft/config/elementapi.php The contents of this file will look something like this once you are done:

<?php

namespace Craft;

return [
    'endpoints' => [
        'directory.json' => [
            'elementsPerPage' => 100,
            'elementType' => ElementType::Entry,
            'serializer' => 'dataArray',
            'criteria' => [
                'section' => ['directory'],
                'type' => ['directory'],
            ],
            'transformer' => function (EntryModel $entryModel) {
                $divisions = $entryModel->division->find();
                return [
                    'type' => 'directory',
                    'slug' => $entryModel->slug,
                    'first_name' => $entryModel->firstName,
                    'mi' => $entryModel->middleInitital,
                    'last_name' => $entryModel->lastName,
                    'bio' => $entryModel->bio ? $entryModel->bio->getParsedContent() : '',
                    'email' => $entryModel->emailAddress,
                    'divisions' => array_reduce($divisions, function ($carry, $division) {
                        return array_merge($carry, [
                            $division->slug => $division->title,
                        ]);
                    }, []),
                ];
            },
        ],
    ],
];

There is not much detail I can go into here about how to message your data into a usable format, but hopefully this example and the docs for the Element API well get you well on your way. Again do as much work up front - it makes the Statamic side much easier. This example will create an endpoint at https://example.com/directory.json which we can then paginate through using the "page" query parameter.

Other

The simplest way to handle other CMS data is to dump the live database and import it into your dev workspace. From there you can easily fetch all the data using Laravel's database query builder like so:

<?php
DB::connection('mysql_old')
    ->table('posts')
    ->orderByDesc('createdon')
    ->when($this->id, fn($query) => $query->where('id', $this->id))
    ->lazyById($this->perChunk)
    ->each($this->importPost(...));

A few things to note, I created a second MySQL connection in the config/database.php file so I could easily connect to the old database while still having access to a new one (if it was needed for asset meta data). Also, I am using the lazyById method to create a lazy collection to loop over all the rows without loading them all into memory first. This is essential when you have thousands of rows you are importing. Finally, I separated out the actual import logic into a separate method on this command.

TIP: The three dots ->importPost(...) is the first class callable syntax which is prettier shorthand for ->each(fn ($post) => $this->importPost($post)) or even longer version ->each(function ($post) { $this->importPost($post); }) Modern PHP is wonderful isn't it?

The Import

Now that we have access to the existing content, we can work on the hard part: parsing the content and importing it into Entries, Terms, and Assets. My personal preference is to create a new Laravel command:

php artisan make:command ImportPosts

From there we can customize the command structure to look something like:

<?php
class ImportPosts extends Command
{
    protected $signature = "import:posts {slug?}";
    protected $description = 'Imports posts (news) content from the live site.';

    public function handle()
    {
	    //...
    }
}

While adding the slug (or ID) argument is optional, it is very helpful when diagnosing a single Entry which is not importing properly. If any of the entries fail or don't import properly, you can easily re-run the import for a single post using:

php artisan import:posts slug-of-post-that-wont-import-properly

Now let's look at what the handle method might look like for a site coming from WordPress. To make my life easier I like to provide some nice prompts as to the number of posts to import at a given time. This way I can run the import over and over for just the first 10 posts until I nail down the details.

<?php
use function Laravel\Prompts\text;
//...
    public function handle()
    {
        if ($this->argument('slug')) {
            $number_to_skip = 0;
            $number_to_import = 1;
        } else {
            $number_to_skip = intval(text(
                label: 'Number of posts to skip',
                default: 0,
                validate: fn (string $value) => match (true) {
                    !is_numeric($value) => 'Value must be a whole number',
                    default => null,
                },
            ));

            $number_to_import = intval(text(
                label: 'Number of posts to import',
                required: true,
                validate: fn (string $value) => match (true) {
                    !is_numeric($value) => 'Value must be a whole number',
                    default => null,
                },
            ));
        }
		//...
    }
//...

Now through the power of Laravel Prompts we have a pretty UI to decide how many posts to import and how many to skip (if we want to import a specific range). Next we need to run a loop to import those posts. Note: there is probably a better way to do this, but this approach made sense to me.

<?php
use function Laravel\Prompts\spin;
use function Laravel\Prompts\info;
//...
    public function handle()
    {
        //...
        $current_page = 0;
        $per_page = 100;
        $count = 0;

        while ($number_to_import > 0) {
            $current_page++;

            if ($number_to_skip >= $per_page) {
                $number_to_skip -= $per_page;
                continue;
            }

            $response = spin(
                fn () => Http::get('https://example.com/wp-json/wp/v2/posts/', [
                    'page' => $current_page,
                    'per_page' => $per_page,
                    '_fields' => 'sync',
                ])->json(),
                'Fetching posts...',
            );

            if (
                ($response['code'] ?? '') === 'rest_post_invalid_page_number' ||
                count($response) === 0
            ) {
                break;
            }

            $start = $per_page * ($current_page - 1) + $number_to_skip + 1;
            $end = $number_to_import > $per_page ? $start + $per_page - 1 : $start + $number_to_import - 1;
            $progress = progress(
                label: 'Importing post' . ($start === $end ? " {$start}" : "s {$start}-{$end}"),
                steps: min(count($response) - $number_to_skip, $number_to_import),
            );
            $progress->start();

            foreach ($response as $post) {
                if ($number_to_import === 0) {
                    break;
                } elseif ($number_to_skip > 0) {
                    $number_to_skip--;
                    continue;
                }

                if ($this->argument('slug') && $post['slug'] !== $this->argument('slug')) {
                    continue;
                }

                $progress->hint("Importing post: {$post['slug']}");
                $this->post($post['sync']);

                $progress->advance();
                $count++;
                $number_to_import--;
            }

            $progress->finish();
        }

        info("Successfully imported {$count} posts! 🚀");
        return Command::SUCCESS;
    }
//...

All of that is just to set up a consistent way to run the command with however many items you want to import, starting from any offset. The real work kicks in when we call the $this->post($post['sync']) method:

<?php
//...
    private function post($post)
    {
        $slug = $post['slug'];
        $entry = Entry::query()->where('collection', 'posts')->where('slug', $slug)->first();
        $entry ??= Entry::make()->collection('posts')->slug($slug);

        $saved = $entry
            ->date(Carbon::parse($post['date']))
            ->published($post['status'] === 'publish')
            ->merge([
                'title' => $post['title'],
                'content' => $this->importService->parseBardContent($post['content']),
                'post_tags' => $this->tags($post),
                'post_categories' => $this->categories($post),
                'image' => $this->featuredImage($post),
            ])
            ->save();

        if (!$saved) {
            throw new Exception("Error saving post {$slug}");
        }
    }
//...

Note, I am trying to do this in the most idempotent way possible; this command could be run when the post has or has not been created. If the import has run previous to this the post will already exist with the same slug, I will want to update it not create it. This is why I have both an Entry::query() and an Entry::make() at the beginning.

TIP: The ??= is a null coalescing assignment operator - which is shorthand for $entry = $entry ?? Entry::make() . The null coalescing operator ?? is used to check if a value is both set/exists and is not null. So technically the ??= operator is shorthand for $entry = isset($entry) && !is_null($entry) ? $entry : Entry::make() - much shorter to just use the operator!

This can be even shorter in the future if/when this PR gets merged: https://github.com/statamic/cms/pull/9815

Then notice I am using ->merge() to set the data on the entry. Normally we would use ->data() when creating an entry but that replaces the data with the passed array. Merge combines the current data (which is empty when creating an entry) with the passed array. This means we can use a single data array, and it will both instantiate a new entry and update an existing one properly.

One more gotcha: if you want the slug to take effect when creating or updating an entry you must explicitly call the ->slug($slug) method. It does things behind the scenes that won't happen if you pass the slug in with the other data (ie. next to title). The same goes for ->date() and ->published()

I won't get into the ->parseBardContent() method as that is a whole another blog post. The ->tags() and ->categories() methods both do the same thing we did for the posts but with tags and categories:

<?php
//...
    private function categories($post): array
    {
        $categories_raw = $post['categories'] ?: [];
        $categories = [];
        foreach ($categories_raw as $category) {
            $term = Term::query()
                ->where('taxonomy', 'post_categories')
                ->where('slug', $category['slug'])
                ->first();
            $term ??= Term::make($category['slug'])->taxonomy('post_categories');

            $term->merge([
                'title' => $category['name'],
            ])->save();
            $categories[] = $term->slug();
        }

        return $categories;
    }
//...

The important things here are that we pass the category slug into Term::make() when creating a new category. And we need to return an array of term slugs back up to the post data.

Finally the ->featuredImage() method simply intakes the URL of the image uploads it to the asset's filesystem, creates the asset, and returns the id:

<?php
use Statamic\Facades\Asset;
use Storage;
//...
    private function featuredImage($post): string
    {
        $url = $post['featured_image'] ?? '';
        if (empty($url)) {
            return '';
        }

        $file = fopen($url, 'r');

        if (
            !$storage->exists($new_path) &&
            !$storage->put($new_path, $file)
        ) {
            throw new Exception("Error uploading asset {$new_path}");
        }

        $asset = Asset::query()->where('container', 'assets')->where('path', $new_path)->first();
        $asset ??= Asset::make()->container('assets')->path($new_path);
		$asset->merge([])->save();
	  
	    fclose($file);

        return $asset->id();
    }
//...

A few things to note, I check the file does not exist first before attempting to upload it. When uploading, I use a stream via fopen() to try to avoid using up unnecessary resources - this helps with large files.

There you have it. You should now be able to run your import command, fill out the prompts, and watch the progress bar creep forward - or like me, go get some coffee come back, and find you still have bugs in your importer. 🙃

---

EDIT 08/18/2025 - A few grammar corrections and clarifications