Or how we improved the WordPress database migration speed by 40x with Go.
At VIP we move WordPress databases around many times per day and as you can probably imagine, our customers tend to have of a lot of data. When we move data between environments, one of the bottlenecks has historically been search and replace of the domain names. The two main use cases are importing data containing development domain names and moving production data to a hosted development environment.
The best option in most situations is WP-CLI’ssearch-replace command. WP-CLI is fairly ubiquitous at this point and it’s easy to use. The problem that we tend to have is that it’s not quite fast enough on really huge datasets.
You may be wondering how a simple search and replace could be the bottleneck. WordPress stores lots of PHP serialized strings in the database. Since object lengths are encoded in serialized strings, you can’t simply search and replace domain names in the database unless they happen to be the same length.
WP-CLI deals with this by pulling the objects out of the database, unserialzing them, running recursive search and replace on the resulting object, reserializing them, and then updating the database. While this only takes a few seconds on most WordPress sites, it can take many minutes or even hours on sites with millions of post objects.
Since our migration process is based on .sql files, we thought it might be faster to run the search and replace outside WordPress as long as we could reliably fix the encoded lengths. So, I wrote Automattic/go-search-replace to do that.
There are two main things that happen:
The search and replace. We simply replace one domain with another.
Fix encoded string lengths as necessary.
It turns out that dealing with the string lengths is not as hard as we originally thought. Modifying nested objects is not a concern because we’re not changing the number of nested items. We only need to look at the string length immediately preceding the string we’re replacing.
Another problem that could have been hard to deal with is maintaining the integrity of the .sql file. It would be easy to replace the characters that are used by the MySQL importer to delineate records, like ),\n(. Our solution is to limit the search domain roughly to the characters that make up valid domain names.
Using this new command line tool, we were able to improve the search and replace process by about 25x. I wondered if we could make it even faster using concurrency.
One challenge is that we need to ensure every line in the resulting file is in the same place. For that, we use a buffered channel of channels. For each line, we write a new string channel to the channel and asynchronously write to that channel. Effectively, we put a placeholder on the channel synchronously, and write to it asynchronously. Here’s that code on GitHub.
We’ve been using this in production for over a year now. In some tests we’ve seen the search and replace performance improved by up to 40x. Since we use STDIN and STDOUT, we can use go-search-replace as part of an import pipeline. In some cases, the search and replace runs at line rate and has no effect on the total import time.
I’ve worked from home for a long time now. Forever, really — Automattic was my first real job after college. In a lot of ways, I work similarly to the way I did in college except that I go to a coffee shop instead of a library.
When I started working from home, I noticed people were very concerned that it would be too challenging. I’ve gotten used to the questions, and over the years, I’ve refined my answer to a handful of main points.
Get out of the house most days. Whatever that looks like for you will depend on how you work best and how much you like to interact with other people. It could be going for a walk around the block, spending some time in a coffee shop, or spending the whole day in a co-working space.
Try to follow a routine. Going back to the first point, it helps me to take a walk to a coffee shop every morning. The ten minute walk to the coffee shop is sort of like my commute and defines the start of my day. When you work from home, it’s easy to forget when it’s time to “go home” for the day. Having a routine to start and end the day helps with that.
Take advantage of the flexibility. Like anything, working from home has pros and cons. One of the advantages is the flexibility you get, so use it! I try to get out of the house most days, but I also have the option to stay in if it’s particularly cold or there’s a blizzard and I don’t feel like walking through the snow.
Set up a nice home office. I think a nice chair is a minimum requirement. Don’t cheap out here. It will pay for itself in chiropractor bills. I also like a desk that can convert to a standing desk. There are a lot of fancy automatic options, but I personally recommend saving some money and getting one with a manual crank. I need a nice monitor and noise canceling headphones. Again, do what works for you.
Meet people outside work. Joining a co-working space can be a nice way to do this, but you could join a basketball league or organize a local Internet of Things meetup — whatever interests you. Joining a new company can be a big transition. Especially if you’ve just graduated and many of your friends are moving across the country.
I also like to make a distinction between distributed companies and remote workers. Automattic is a distributed company. Everyone works from home — or wherever they feel most productive. Sometimes people work from home, but in a company where most people are colocated in one or several offices.
The difference might not seem obvious at first, but I think it’s important because it can be a defining factor in how the rest of the company communicates with remote workers. Since we’re all distributed at Automattic, there’s no way to overlook someone who is remote. If most of your company is colocated, it helps to understand how the rest of the company thinks about communication.
Working from home is great. My colleagues sometimes say that they’ve broken me — remote work is all I know. I’m not sure if I could work in an office every day, but maybe someone else can write that blog post. 🙃
Almost a year ago now, I started a Docker image for WordPress so I could guarantee there would be new builds available immediately after new WordPress releases.
Since I was working on an image anyway, I decided to build in support for Memcached. Object cache plugins like Memcached Object Cache are pretty common on high traffic WordPress sites.
I also set up wp-config.php to load some of the configuration options from environment variables. To make the image a little more extensible, I also have it automatically loading PHP files in the /var/www/html/wp-config directory. You can mount a volume with all your extra config files in the container to automatically load them.
After the initial setup, maintenance was super easy. Docker Hub automatically builds every time I push to GitHub. I’ve also got a script that checks for new WordPress and WP-CLI releases and automatically pushes updates to GitHub.
There are currently PHP 7.3 builds for WordPress core (PHP-FPM) and WP-CLI available.