caustik's blog

programming and music

Archive for April 6th, 2012

Node.js scalability testing with EC2

with 7 comments

sprites server

The sprites project leverages Node.js to implement it’s server logic. The server is surprisingly simple JavaScript code which accepts long-poll COMET HTTP connections from it’s C++ clients in order to push JSON format sprite information in real-time. Sprites can be thrown from one client’s desktop to another via HTTP. The server maintains a network topology (which is worth a separate post in itself, it turned out to be an interesting algorithm), which is then used to send the posted sprite to the appropriate neighbor.

The asynchronous sockets in Node.js scale very well. Combine this with the JavaScript being interpreted by the blazing fast V8 engine, and you have the foundation of a very simple and flexible server platform which scales mightily.

So, how well does it scale?

In order to test that, I decided to leverage Amazon EC2 to run a bunch of fake clients to simulate network usage. Each EC2 instance can make thousands of connections to the sprites server, and simulate throwing sprites around.

To implement this stress test, I first booted up a single instance with one of the default Linux templates. From there it only takes 5 minutes to install a basic build environment using “yum install subversion make gcc-c++” — after that, I modified /etc/rc.local to do a couple things:

1) ulimit -n 999999

This is necessary to get around the default limitation in file handles, which applies to sockets, that would otherwise prevent the instance from making more than about 1024 connections.

This command must also be used on the Node.js server, which you don’t really see documented anywhere! If you don’t increase the file handle limit on your Node.js server, somewhere down the line you are going to run into clients being unable to connect, along with the server suddenly deciding to peg the CPU at 100% and behave strangely.

I’ve seen all sorts of posts around the net struggling to figure out why their Node.js server doesn’t scale quite as high as they think it should. Well, if you’re not increasing your file handle limit, that would definitely be one explanation. Aside from that, it’s obviously critical to write very performance-conscious JavaScript code.

2) svn update the test working directory

This is done so that each instance is always up to date with the latest test code after a reboot. This greatly simplifies scalability stress testing. You just start and stop instances using the EC2 console to magically test the latest code at whatever scale you want. You can literally test a million concurrent connections using this technique (though, I’m pending a quota increase on EC2 to actually give this a try, they stop you after 20 instances by default).

3) cd into the test working directory, build, and run the test

That’s it! I use a Windows development machine to create the cross-platform test code, and whenever a change is made, I commit it to subversion, and recreate/reboot all the EC2 instances. They automatically update, and swarm the server. It’s a beautiful thing 🙂

Since each instance on EC2 costs $0.02 (yes, 2 cents), this is actually incredibly cheap. Each test run takes well under an hour, so for 20 servers with 1000 simulated connections each, you can test 20,000 users for 40 cents! To test a million clients, it would run you about 20 bucks. Not bad… You could tweak this to be more frugal by adding more connections per-instance, and maybe even leverage Spot instances for a lower rate per-hour.

Written by caustik

April 6th, 2012 at 3:28 am