Highly Available Node

At VIP, we run a highly available Node service that powers much of our platform. One of the biggest challenges we see teams face is the question of how to scale a highly available API.

That’s a broad problem to solve, but let’s assume we already have adequate test coverage and everything in front of the API taken care of for us. We only care about things we can change about the Node app itself.

Our typical answer looks something like this:

  1. Use Node’s cluster module to fully take advantage of multiple CPUs
  2. Gracefully reload worker processes for deploys and uncaught exceptions

Node Cluster

Node’s cluster module uses child_process.fork() to create a new process where communication between the main process and the worker happens over a unix socket.

The TCP module’s server.listen() function hands off most of the work to the main process, allowing child processes to act like they’re all listening on the same port.

HTTP Server Example

Let’s take a simple http server as an example. Here we have a server that listens on port 3000 by default and returns Hello World!. It also throws an uncaught exception 0.001% of the time to simulate a bug we haven’t accounted for.

/** * External dependencies */ const { createServer } = require( 'http' ) module.exports = createServer( ( req, res ) => { if ( Math.random() > 0.99999 ) { // Randomly throws an uncaught error 0.001% of the time throw Error( '0.001% error' ) } res.end( 'Hello World!\n' ) } ).listen( process.env.port || 3000 )
Code language: JavaScript (javascript)

Obviously a real server would be much more complex, but this toy example will be adequate for this example. We could run this server with node server.js and we’d have an http server running on our server.

The first thing we’ll do is use Node’s cluster module to start one copy of the server per CPU, which will automatically load balance between them.

#!/usr/bin/env node /** * External dependencies */ const cluster = require( 'cluster' ) const WORKERS = process.env.WORKERS || require( 'os' ).cpus().length if ( cluster.isMaster ) { for ( let i = 0; i < WORKERS; i++ ) { cluster.fork() } cluster.on( 'listening', ( worker, address ) => { console.log( 'Worker %d (pid %d) listening on http://%s:%d', worker.id, worker.process.pid, address.address || '', address.port ) } ); } else { const server = require( './server' ) }
Code language: JavaScript (javascript)

This will start one copy of the server for each CPU in our system. The operating system will take care of scheduling these processes across the CPUs.

Graceful Reload

Now that we have multiple processes, we can gracefully reload these in case of errors and for deploys.


In case of errors, we terminate the worker process and spawn a new one. This is important because an uncaught exception means the process is now in an inconsistent state. In other words, an exception occurred that was not accounted for and we’re not sure what side effects that will have.

First, we’ll ensure that worker processes are restarted if any exit unexpectedly. In the isMaster branch:

cluster.on( 'exit', ( worker, code, signal ) => { if ( ! worker.exitedAfterDisconnect ) { console.log( 'Worker %d (pid %d) died with code %d and signal %s, restarting', worker.id, worker.process.pid, code, signal ) cluster.fork() } } )
Code language: JavaScript (javascript)

Here worker.existAfterDisconnect would be true if we call worker.disconnect() or worker.kill(), but false if the worker itself calls process.exit(). That becomes important in this next step, where we automatically terminate the worker process in the case of an uncaught exception.

const SHUTDOWN_TIMEOUT = process.env.SHUTDOWN_TIMEOUT || 5000 process.on( 'uncaughtException', error => { console.log( error.stack ) // Stop accepting connections and exit server.close( () => process.exit( 1 ) ) // Force shutdown after timeout setTimeout( () => { process.exit( 1 ) }, SHUTDOWN_TIMEOUT ) } )
Code language: JavaScript (javascript)

We stop connecting new connections with server.close() and terminate the process with process.exit( 1 ) when all existing connections are closed. Since we want to ensure this worker is stopped within a reasonable timeframe, we force it to close after 5 seconds.


For deploys, we gracefully reload all the worker processes one at a time to avoid any downtime in the process.

In the worker, we look for the main process to send a message that simply says “shutdown”. This again calls server.close() to stop accepting new connections and terminates the process when all active connections have closed.

const server = require( './server' ) process.on( 'message', message => { switch( message ) { case 'shutdown': server.close( () => process.exit( 0 ) ) return } } )
Code language: PHP (php)

Upon SIGHUP we create one new worker for each active worker and gracefully shutdown the old worker when the new one is ready to accept connections.

process.on( 'SIGHUP', () => { console.log( 'Caught SIGHUP, reloading workers' ) for ( const id in cluster.workers ) { cluster.fork().on( 'listening', () => { gracefulShutdown( cluster.workers[ id ] ) } ) } } )
Code language: PHP (php)

Gracefully shutting down a worker involves a few steps.

First, we send the shutdown signal that the worker is listening for and disconnect. As mentioned before, when all the connections are closed, the worker process will terminate itself. Again, since we want to ensure this worker is stopped within a reasonable timeframe, we force it to close with worker.process.kill() after 5 seconds.

const SHUTDOWN_TIMEOUT = process.env.SHUTDOWN_TIMEOUT || 5000 const gracefulShutdown = worker => { worker.send( 'shutdown' ) worker.disconnect() const shutdown = setTimeout( () => { worker.process.kill() }, SHUTDOWN_TIMEOUT ) worker.on( 'exit', () => clearTimeout( shutdown ) ) }
Code language: JavaScript (javascript)

Upon SIGINT or ^C, we’ll perform a similar graceful shutdown routine. The only difference is that we don’t need to restart each worker this time.

process.on( 'SIGINT', () => { console.log( 'Caught SIGINT, initiating graceful shutdown' ) for ( const id in cluster.workers ) { gracefulShutdown( cluster.workers[ id ] ) } } )
Code language: JavaScript (javascript)

To prevent the initial SIGINT from propagating to worker processes and immediately terminating them, we’ll handle the signal separately there. The first one is ignored, but if you press ^C or otherwise send SIGINT twice, all threads are closed immediately, bypassing the graceful shutdown.

process.on( 'SIGINT', () => { // Ignore first SIGINT from parent process.on( 'SIGINT', () => { process.exit( 1 ) } ) } )
Code language: PHP (php)

I hope this was helpful. You can see the full example on GitHub.

Wire 1.5

There have been several recent updates to Wire focused on making the app more responsive and easier to use.

The original goal of Wire was to build an RSS reader that renders content in the format of the website, instead of a stylized view of the text. The whole idea is that a website is more than just what’s in the <content:encoded> tags in an RSS feed, but the CSS and JavaScript that browsers render as well.

There are downsides of loading the URL of an article in a web view though — namely, the overhead of downloading the article and then rendering it. On a fast connection, it’s noticeable. On a slow connection, it can be annoying.

To that end, the last couple of releases have been focused on improving that aspect of the experience. As of version 1.4, Wire downloads every article, which improves performance and makes offline viewing possible. As of version 1.5, articles are pre-rendered to make the transition from the article list to the web view as fast as if we were just rendering displaying the text from the RSS feed.

I’ve been using this for a couple weeks and the more responsive feels like magic.

Spying TVs are getting cheaper

But the most interesting and telling reason for why TVs are now so cheap is because TV manufacturers have found a new revenue stream: advertising. If you buy a new TV today, you’re most likely buying a “smart” TV with software from either the manufacturer itself or a third-party company like Roku.

Noah Kulwin in The Outline

It is so creepy when Roku TVs show a message to “continue watching from the beginning” when you’re watching something on an Apple TV. I assume the TV is constantly sending frames of whatever is on screen to Roku servers for analyzing. It seems unlikely that the TV is capable of doing this recognition on its own.

The first time this happened, I finally broke down and bought a Raspberry Pi so I could set up Pi-hole.

I can’t believe this spying is not a huge story.

One Month of AirPods Pro

The initial reviews of the AirPods Pro were incredible, if not a little hard to believe.

It’s true that the noise cancellation is very good though. I work in cafes regularly and still find it sort of incredible how good they are at cancelling out background noise. If someone is having a loud conversation right next to you, you can kind of hear it if the volume is low enough. The background buzz of people talking is completely gone though.

They also, predictably, stay in my ears much more reliably than the previous AirPods. If you get them, definitely try all the tips. I used the medium ones for two weeks and they were fine, but the smaller ones fit even better.

Transparency mode is the killer feature I feel like nobody is really talking about — it’s audio AR. Paired with a future pair of AR glasses and maybe a watch, you can start to see the path to making smartphones obsolete.

The only (small) problem I have so far is that transparency mode is unusable with any kind of hat that covers your ears, which means I won’t be using it very much for the next few months.

Overall, I find the AirPods Pro very exciting.

josh.blog 2.0

I must have a problem staying away from code for too long. After being on vacation last week, I decided to redo the CSS on this site over the weekend.

The main focus this time was keeping the design as simple as possible and ensuring the text is easy to read. The craziest design element is three little dots that divide articles.

For the colors, I used the Duck Duck Go Color Picker — literally search for “color picker”.