Scaling node.js to 100k concurrent connections!

UPDATE: Broke the 250k barrier, too :]

The node.js powered sprites fun continues, with a new milestone:

That’s right, 100,004 active connections! Note the low %CPU and %MEM numbers in the picture. To be fair, the CPU usage does wander between about 5% and 40% – but it’s also not a very beefy box. This is on a $0.12/hr rackspace 2GB cloud server.

Each connection simulates sending a single sprite every 5 seconds. The destination for each sprite is randomize to an equal distribution across all nodes. This means there is traffic of 20,000 sprites per second, which amounts to 40,000 JSON packets per second. This doesn’t even include the keep-alive pings which occur on a 2-minute interval per connection.

At this scale, the sprite network topology remains very responsive. Tested using my desktop PC neighboring my laptop, throwing a sprite off the screen arrives at the laptop so fast that I can’t gauge any latency at all.

Here are a few key tweaks which contribute to this performance:

1) Nagle’s algorithm is disabled

If you’re familiar at all with real-time network programming, you’ll recognize this algorithm as a common socket tweak. This makes each response leave the server much quicker.

The tweak is available through the node.js API “socket.setNoDelay“, which is set on each long-poll COMET connection’s socket.

2) V8’s idle garbage collection is disabled via “–nouse-idle-notification”

This was critical, as the server pre-allocates over 2 million JS Objects for the network topology. If you don’t disable idle garbage collection, you’ll see a full second of delay every few seconds, which would be an intolerable bottleneck to scalability and responsiveness. The delay appears to be caused by the garbage collector traversing this list of objects, even though none of them are actually candidates for garbage collection.

I’m eager to experiment further by scaling this up to 250k connections. The only thing keeping that test from being run is the quota on my amazon EC2 account, which is limiting the number of simulated clients I can run simultaneously. They have responded to my request to increase quota, but sadly it hasn’t taken effect yet.

The sprites source code, both client and server, are available via subversion. The repository URLs are provided on the sprites web site.

http://sprites.caustik.com/

For more information about the testing and tweaks involved in scaling the server, check my previous post Node.js scalability testing with EC2.

28 thoughts on “Scaling node.js to 100k concurrent connections!”

This is awesome stuff. I work on a lot of “traditional” stacks that often struggle with this scenario, especially if customers are flooding in requests or needing responses at a high rate. This had def piqued my interest and I look forward to your future posts on this topic.

Just looking for the “validation” for Node in our systems.

Share this:

Related

28 thoughts on “Scaling node.js to 100k concurrent connections!”

Leave a comment Cancel reply