Node.js w/250k concurrent connections!

Okay, I figured it must be possible, so it had to be done..

The sprites server has smashed through the 250k barrier with 250,001 concurrent, active connections. In addition to the stuff talked about in my previous two blog entries, it took a few more optimizations.

1) Latest tagged revision of V8, which appears to perform a little better

The 250k limit is right on the fringe of what this server can pull off without violating the 1.4GB heap limitation in V8. It’s not clear to me why this limitation hasn’t bubbled up in priority enough to be taken care of yet. Just look at those free CPU cycles and unused memory! V8 is a complete bad-ass, and it’s just this one limitation that is holding it back from some really extreme capabilities.

I had tried 250k a few times, and couldn’t get it stable until after upgrading the /deps/v8/ directory of Node.js with the latest version tagged in SVN. I was really hoping the 1.4GB limit had been removed, but alas. Instead, just had to settle with some significant improvements to garbage collection and performance in general.

2) Workers via “cluster” module

I figured the onslaught of HTTP GET requests the 100 EC2 servers were unleashing at a rate of 100,000 JSON packets per second was contributing both to CPU and memory consumption, which was agitating the garbage collector enough to resist the 250k mark.

So, to reduce the overhead of these transient requests, which are in addition to the 250k concurrent connections (which really leaves us on average at about 350k connections at any given second, 100k of which are transients), I decided to leverage the cluster module.

The master performs the exact same tasks it did before. Except, now there are workers spawned, one for each CPU on the system, and they listen on a separate port from the master process. This port is used by the clients for these transient requests, and as they arrive they are parsed and forwarded to the master using the Node.js send() function.

The only reason this was a critical adjustment is, again, the 1.4GB heap limitation in V8. Keeping all the resources associated with those requests off the master process saves just enough memory to get past the 250k milestone.

TL;DR – V8, you’re so good, but it’s time for you to support more memory!

Scaling node.js to 100k concurrent connections!

UPDATE: Broke the 250k barrier, too :]

The node.js powered sprites fun continues, with a new milestone:

That’s right, 100,004 active connections! Note the low %CPU and %MEM numbers in the picture. To be fair, the CPU usage does wander between about 5% and 40% – but it’s also not a very beefy box. This is on a $0.12/hr rackspace 2GB cloud server.

Each connection simulates sending a single sprite every 5 seconds. The destination for each sprite is randomize to an equal distribution across all nodes. This means there is traffic of 20,000 sprites per second, which amounts to 40,000 JSON packets per second. This doesn’t even include the keep-alive pings which occur on a 2-minute interval per connection.

At this scale, the sprite network topology remains very responsive. Tested using my desktop PC neighboring my laptop, throwing a sprite off the screen arrives at the laptop so fast that I can’t gauge any latency at all.

Here are a few key tweaks which contribute to this performance:

1) Nagle’s algorithm is disabled

If you’re familiar at all with real-time network programming, you’ll recognize this algorithm as a common socket tweak. This makes each response leave the server much quicker.

The tweak is available through the node.js API “socket.setNoDelay“, which is set on each long-poll COMET connection’s socket.

2) V8’s idle garbage collection is disabled via “–nouse-idle-notification”

This was critical, as the server pre-allocates over 2 million JS Objects for the network topology. If you don’t disable idle garbage collection, you’ll see a full second of delay every few seconds, which would be an intolerable bottleneck to scalability and responsiveness. The delay appears to be caused by the garbage collector traversing this list of objects, even though none of them are actually candidates for garbage collection.

I’m eager to experiment further by scaling this up to 250k connections. The only thing keeping that test from being run is the quota on my amazon EC2 account, which is limiting the number of simulated clients I can run simultaneously. They have responded to my request to increase quota, but sadly it hasn’t taken effect yet.

The sprites source code, both client and server, are available via subversion. The repository URLs are provided on the sprites web site.

For more information about the testing and tweaks involved in scaling the server, check my previous post Node.js scalability testing with EC2.

Node.js scalability testing with EC2

sprites server

The sprites project leverages Node.js to implement it’s server logic. The server is surprisingly simple JavaScript code which accepts long-poll COMET HTTP connections from it’s C++ clients in order to push JSON format sprite information in real-time. Sprites can be thrown from one client’s desktop to another via HTTP. The server maintains a network topology (which is worth a separate post in itself, it turned out to be an interesting algorithm), which is then used to send the posted sprite to the appropriate neighbor.

The asynchronous sockets in Node.js scale very well. Combine this with the JavaScript being interpreted by the blazing fast V8 engine, and you have the foundation of a very simple and flexible server platform which scales mightily.

So, how well does it scale?

In order to test that, I decided to leverage Amazon EC2 to run a bunch of fake clients to simulate network usage. Each EC2 instance can make thousands of connections to the sprites server, and simulate throwing sprites around.

To implement this stress test, I first booted up a single instance with one of the default Linux templates. From there it only takes 5 minutes to install a basic build environment using “yum install subversion make gcc-c++” — after that, I modified /etc/rc.local to do a couple things:

1) ulimit -n 999999

This is necessary to get around the default limitation in file handles, which applies to sockets, that would otherwise prevent the instance from making more than about 1024 connections.

This command must also be used on the Node.js server, which you don’t really see documented anywhere! If you don’t increase the file handle limit on your Node.js server, somewhere down the line you are going to run into clients being unable to connect, along with the server suddenly deciding to peg the CPU at 100% and behave strangely.

I’ve seen all sorts of posts around the net struggling to figure out why their Node.js server doesn’t scale quite as high as they think it should. Well, if you’re not increasing your file handle limit, that would definitely be one explanation. Aside from that, it’s obviously critical to write very performance-conscious JavaScript code.

2) svn update the test working directory

This is done so that each instance is always up to date with the latest test code after a reboot. This greatly simplifies scalability stress testing. You just start and stop instances using the EC2 console to magically test the latest code at whatever scale you want. You can literally test a million concurrent connections using this technique (though, I’m pending a quota increase on EC2 to actually give this a try, they stop you after 20 instances by default).

3) cd into the test working directory, build, and run the test

That’s it! I use a Windows development machine to create the cross-platform test code, and whenever a change is made, I commit it to subversion, and recreate/reboot all the EC2 instances. They automatically update, and swarm the server. It’s a beautiful thing 🙂

Since each instance on EC2 costs $0.02 (yes, 2 cents), this is actually incredibly cheap. Each test run takes well under an hour, so for 20 servers with 1000 simulated connections each, you can test 20,000 users for 40 cents! To test a million clients, it would run you about 20 bucks. Not bad… You could tweak this to be more frugal by adding more connections per-instance, and maybe even leverage Spot instances for a lower rate per-hour.

Sprites application, with source code

I’ve put together a new site, with a forum and source code, for my goofy sprites program:

The project has ended up being a good demonstration for using node.js to host a server for C++ clients. The sprites program uses a node.js server to automatically pair up anyone running the program, so sprites you throw off either side, top or bottom, of your desktop are thrown onto another person’s desktop. The network topology is generated by the server using a novel match-making algorithm. It’s pretty fucking cool, actually.

There have also been a great deal of Windows programming nuances to get the layered window animations working smooth as butter. The whole thing is optimized to perform well with over a hundred sprites at a time. This is done via caching all the animation data, deferring window repositioning, synchronization with DWM refresh, and other techniques.

Anyway, download it, install it, register for the forum and give some feedback IF you don’t mind. It’s a very young project and really benefits a lot from feedback and people trying it out.

If you happen to be the first person to see this blog post and install it, you have the unique opportunity to flood my desktop with a metric ton of Mario sprites. Otherwise, you still have the opportunity to flood some random other person’s desktop, but just be prepared to expect the same in return 😛


Where’s caustik?

Geesh haven’t updated this blog in a while. I’ll do this mini update at least..

Took a new job at opencandy (some really cool stuff in the pipeline there!), still pushing out DJ mixes here and there over at my soundcloud page (caustik’s sound cloud). Create a soundcloud account if you haven’t got one already and make sure to add me =)

Every once in a while I’ve been working a bit on this app that lets you create sprites that can walk around your desktop and interact with your windows and each other. Here’s a little screenshot. It’d be cool to talk with some of the guys who run sprite repository web sites and get a bunch of characters created =)

Here’s the latest build. Just drag and drop an .spr file or any image file on the executable and it should drop on your desktop. You can drag and throw them around, and you can click the sprite/image and press left/right/up/down/space to move it around. It’s fairly basic right now, but the plan is to make them able to walk around by themselves and interact more, etc.


oh, and double click to exit. if you drag and drop a bunch of them, they all open in one instance and double clicking on one will close them all. have fun =)

Fix for multiple monitors with fullscreen games

There’s a pretty annoying issue that happens in windows when you are playing a game in fullscreen on one monitor, and have a second (or third, fourth) monitor with other stuff open on it. If your game is playing at a resolution that is different than your normal desktop resolution on that monitor, everything gets all moved around on the other monitors.

To fix this, I wrote a tiny utility program which you can run before you start the game, and it fixes this behavior by calculating what the correct location should be for all your windows and moving them there. It’s useful if you like having IM windows open, notes or a web browser, etc. You can’t interact with those programs while you’re in the game, but at least you can *see* them (for example, if you have a strategy guide or map, etc open)

Of course, I’m not responsible for any damage this program does to your computer. I’ve tested it a fair amount, but it’s entirely possible somebody with an unusual setup could find a situation where you may lose a program’s window due to it being positioned in the wrong place.

Let me know if this is useful, I’ll try to improve it and turn it into a proper project.