Node.js w/1M concurrent connections!

I’ve decided to ramp up the Node.js experiments, and pass the 1 million concurrent connections milestone. It worked, using a swarm of 500 Amazon EC2 test clients, each establishing ~2000 active long-poll COMET connections to a single 15GB rackspace cloud server.

This isn’t landing the mars rover, or curing cancer. It’s just a pretty cool milestone IMO, which may help a few people who want to use Node.js for a large number of concurrent connections. So, hopefully it’s of some benefit to a few Node developers who can use these settings as a starting point in their own projects.

Here’s the connection count as displayed on the sprite’s page:

Here’s a sysctl dumping the number of open file handles (sockets are file handles):

Here’s the view of “top” showing system resources in use:

I think it’s pretty reasonable for 1M connections to consume 16GB of memory, but it could probably be trimmed down quite a bit. I haven’t spent any time optimizing that. I’ll leave that for another day.

Here’s a latency test run against the comet URL:

The new tweaks, placed in /etc/sysctl.conf (CentOS) and then reloaded with “sysctl -p” :

net.core.rmem_max = 33554432
net.core.wmem_max = 33554432
net.ipv4.tcp_rmem = 4096 16384 33554432
net.ipv4.tcp_wmem = 4096 16384 33554432
net.ipv4.tcp_mem = 786432 1048576 26777216
net.ipv4.tcp_max_tw_buckets = 360000
net.core.netdev_max_backlog = 2500
vm.min_free_kbytes = 65536
vm.swappiness = 0
net.ipv4.ip_local_port_range = 1024 65535

Other than that, the steps were identical to the steps described in my previous blog posts, except this time using Node.js version 0.8.3.

Here is the server source code, so you can get a sense of the complexity level. Each connected client is actively sending messages, for the purpose of verifying the connections are alive. I haven’t pushed that throughput yet, to see what data rate can be sent. Since the modest 16GB memory was consumed, this would have likely caused swapping and meant little. I’ll give it a shot with a higher memory server next time.

Escape the 1.4GB V8 heap limit in Node.js | Sprites Project

71 thoughts on “Node.js w/1M concurrent connections!”

Michael Hart says:

August 19, 2012 at 2:35 am

I’m guessing you mean the EC2 swarm were actually test clients connecting to a single node server? It’s kinda ambiguous – you might wanna expand on that a little.

LikeLike

Reply
1. caustik says:
  
  August 19, 2012 at 2:43 am
  
  Good point, updated the post to clarify that a bit.
  
  LikeLike
  
  Reply
Michael Hart says:

August 19, 2012 at 6:25 am

Perfect – that makes sense. Awesome job pushing it to the limit by the way!

LikeLike

Reply
Adam Duro says:

August 19, 2012 at 5:03 pm

How significant was the app that was running? Just a simple Hello World?

LikeLike

Reply
1. caustik says:
  
  August 19, 2012 at 5:37 pm
  
  Adam – it’s of moderate complexity.
  
  LikeLike
  
  Reply
Tim McClarren says:

August 19, 2012 at 6:17 pm

Why in the world did you need 500 EC2 instances for client connections? With the right settings, you should be able to get to 1M with about 20 instances.

LikeLike

Reply
1. caustik says:
  
  August 19, 2012 at 6:39 pm
  
  Tim – Let me know when you’ve pulled that off, using EC2 instances, and the overall cost was less than using 500 micro spot instances. Also, doing it this way is more realistic – why would you complain that -too many- unique servers were used, when it’s meant to simulate the real world scenario of each client being a unique IP? I’m just as likely to get a comment asking why I didn’t use -more- unique instances 😉 — anyway, my point is it’s not really as brain dead of a choice as you might think. I’m erring on the side of a more expensive, but more realistic, testing procedure.
  
  LikeLike
  
  Reply
Adam Duro says:

August 19, 2012 at 6:38 pm

Well played, sir. I will look into your tweaks some more. At some point the project I’m bootstrapping with a couple partners may need your consulting services. Cheers.

LikeLike

Reply
Michael Hood says:

August 19, 2012 at 8:08 pm

What instance type did you use for the server?

LikeLike

Reply
1. caustik says:
  
  August 19, 2012 at 8:12 pm
  
  The server was a 15GB rackspace cloud server.
  
  LikeLike
  
  Reply
jtarchie says:

August 20, 2012 at 7:11 am

So we meet again…. caustik!

Is your main goal to scale out vertically with all this? Or just to see how far you can take it?

LikeLike

Reply
caustik says:

August 20, 2012 at 8:25 am

Just saw some limitations and wanted to find a way around them.

Also, it seems to really upset anti-Node web zealots, trolls, and smug redditors, so that’s pretty satisfying.

LikeLike

Reply
fpp says:

August 20, 2012 at 12:22 pm

Great work: Now the outcome of a similar test for a more common Node.js stack (node.js – connect / express – socket.io / web socket connections) would be interesting.

I’ll have a look at your setup and check if such a test can be created that allows comparing results to your simulation outcome.

LikeLike

Reply
Tim McClarren says:

August 20, 2012 at 8:44 pm

Adding more inbound unique IPs into the mix isn’t testing anything further, unless you believe your TCP stack to be broken in some fashion.

LikeLike

Reply
John Adams says:

August 21, 2012 at 9:33 pm

The test client references regplat.h, is there a way we can use your test client or get that library?

LikeLike

Reply
1. caustik says:
  
  August 21, 2012 at 10:11 pm
  
  The svn:external should work for read only access. It’s the main include file for a utility library.
  
  LikeLike
  
  Reply
Raghu says:

August 23, 2012 at 11:13 am

Nice way to use 500 EC2 Instances to generate such large volume of requests and real world scenario. This will also get one past the I/O and bandwidth throttle that AWS imposes on an Instance. Can you tell me which load generation (client) software did you use? And did you work with AWS before the test? Like filling out the penetration test form?

LikeLike

Reply
Daniel says:

August 25, 2012 at 8:28 am

Awesome work!
What did you use to test the latency shown in the picture? http://www.caustik.com/blog/1M-latency1.PNG
Cheers!

LikeLike

Reply
1. caustik says:
  
  August 25, 2012 at 7:22 pm
  
  I just searched for “latency test” and used one of those :]
  
  LikeLike
  
  Reply
arnabc says:

August 26, 2012 at 6:59 am

I don’t see the images, they are throwing 404s

LikeLike

Reply
1. caustik says:
  
  August 26, 2012 at 7:35 am
  
  Thanks for letting me know, it’s fixed now. Just migrated servers today, had a conflicting apache directive!
  
  LikeLike
  
  Reply
arnabc says:

August 26, 2012 at 9:42 am

Yes, images are back thank you for such a quick response.

“Also, it seems to really upset anti-Node web zealots, trolls, and smug redditors, so that’s pretty satisfying.”

I liked this comment as this issue annoys me too, especially in Reddit.

LikeLike

Reply
ccarollo says:

August 27, 2012 at 9:28 pm

I am trying to reproduce your tests (which are very interesting) and I have run into a roadblock.

My tests differ from your because I am running the node.js server on a 16GB m1.XL at EC2. And I am trying to run m1.L for the clients.

The problem I appears on the server where once I reach about 150000 connections node/V8 keeps trying to Scavenge memory. Doing this causes my client connections to time out. Did you have this problem at all?

Thanks.

LikeLike

Reply
caustik says:

August 27, 2012 at 9:49 pm

ccarollo – I did run into the same problem, it was talked about in this other blog post escape the 1.4gb v8 heap limit in node.js. Hope that helps.

LikeLike

Reply
ccarollo says:

August 27, 2012 at 10:31 pm

Caustik, I did read that and I have been using the ulimit change as well as the following command for running the script.

node –trace-gc –expose-gc –nouse-idle-notification –max-new-space-size=2048 –max-old-space-size=14336

Are you saying that you needed to custom build V8 with those two changes to get up to 1M connections?

LikeLike

Reply
caustik says:

August 27, 2012 at 11:57 pm

Yea, the V8 source code changes are needed. It would be nice if future revisions of V8 would allow you more control over heap size / garbage collection.

LikeLike

Reply
ccarollo says:

August 28, 2012 at 12:33 am

Have you tried this test on any of the v0.8.X releases? I forgot to mention that earlier, but I am trying it on v0.8.8 and I have also added the v8 settings, but I still get the same behavior.

Here is an example of the output….

313938 ms: Scavenge 850.2 (900.0) -> 849.6 (900.0) MB, 1 ms [Runtime::PerformGC].
316877 ms: Scavenge 850.4 (900.0) -> 849.6 (900.0) MB, 1 ms [Runtime::PerformGC].
320518 ms: Scavenge 850.6 (900.0) -> 849.6 (900.0) MB, 0 ms [Runtime::PerformGC].
324514 ms: Scavenge 850.6 (900.0) -> 849.6 (900.0) MB, 0 ms [Runtime::PerformGC].
328120 ms: Scavenge 850.6 (900.0) -> 849.6 (900.0) MB, 0 ms [Runtime::PerformGC].
331594 ms: Scavenge 850.6 (900.0) -> 849.6 (900.0) MB, 0 ms [allocation failure].
bdceb1f0-f0a7-11e1-8b35-4936f06b29eb
335123 ms: Scavenge 850.6 (900.0) -> 849.6 (900.0) MB, 0 ms [allocation failure].
338720 ms: Scavenge 850.6 (900.0) -> 849.6 (900.0) MB, 0 ms [Runtime::PerformGC].
342342 ms: Scavenge 850.6 (900.0) -> 849.6 (900.0) MB, 0 ms [allocation failure].
345963 ms: Scavenge 850.6 (900.0) -> 849.6 (900.0) MB, 0 ms [Runtime::PerformGC].

LikeLike

Reply
caustik says:

August 31, 2012 at 8:07 pm

I think those scavenges are fine (0ms), those aren’t the disruptive GC cycles.

Sorry, it took me a while to respond. Hope it’s working for you now.

LikeLike

Reply
caustik says:

August 31, 2012 at 8:12 pm

arnabc – the redditor hate is even better now that their own website couldn’t handle 5M requests per hour (not sure if it was over 1M concurrent at all during that time). It would be a nice test to replicate the stats that took down reddit, and show it being handled by a single Node.js server. Of course there’s complexity to the reddit back-end, for all the various site features, but it’s still fundamentally a connected graph structure like Sprites is.

LikeLike

Reply
Pingback: Cheatsheet: 2012 08.17 ~ 08.31 - gOODiDEA.NET
djeps says:

September 3, 2012 at 6:41 pm

how many front-end IP addresses or how do you Load Balanced requests? Nice job by the way! 😉

LikeLike

Reply
1. caustik says:
  
  September 3, 2012 at 9:36 pm
  
  Just 1 IP, no load balancer at all.
  
  LikeLike
  
  Reply
midletearth says:

September 10, 2012 at 11:13 pm

amazing.
what would the configuration options be to increase the udp performance in the same way? currently max on our servers is 15k msg/sec, which is no joke but am certain can improve.

LikeLike

Reply
krowten says:

September 21, 2012 at 12:39 am

Hi!
How to use 8 GB memory?

My test:
—
var ph = [];
while (true) {
ph.push(‘7232985jkdjf’);
}

node –trace-gc –max-old-space-size=8192 heaptest.js
31 ms: Scavenge 2.1 (35.0) -> 1.8 (36.0) MB, 0 ms [allocation failure].
33 ms: Scavenge 2.6 (36.0) -> 2.4 (36.0) MB, 0 ms [Runtime::PerformGC].
……..
3108 ms: Mark-sweep 712.2 (746.7) -> 428.0 (462.4) MB, 504 ms [Runtime::PerformGC] [GC in old space requested].
4641 ms: Mark-sweep 1067.6 (1102.0) -> 641.2 (675.6) MB, 752 ms [Runtime::PerformGC] [GC in old space requested].
FATAL ERROR: JS Allocation failed – process out of memory
—
ubuntu server x64
node -p -e “process.arch” >> x64
node -v >> v0.8.9

LikeLike

Reply
Nabeel says:

November 27, 2012 at 12:20 pm

Can you explain some of your server code; just the reasoning for using nodes? Is that just to re-use ID’s? Any reason to not just dump all the connections in an array? And maybe a queue with redis or something for push/pop IDs?

Other than that, the server code seems pretty standard; the kernel tweaks are what IMO is more valuable.

LikeLike

Reply
1. caustik says:
  
  November 27, 2012 at 2:46 pm
  
  I decided to use nodes because each connection needs to efficiently find out who it’s neighbors are. If each node knew it’s neighbors only by ID, as opposed to holding a direct reference, each traversal step would require a look-up into an associative array. For finding the neighbor 2 positions to your left and 2 positions up, for example, a graph structure is just a little easier to work with IMO (e.g. pNode->pLeft->pLeft->pTop->pTop). Also, although chromium’s associative arrays may be O(1) complexity, they involve the overhead of a hash function and potential for collisions, which when dealing with hundreds of thousands of connections, add up enough to be a prohibiting performance bottleneck.
  
  LikeLike
  
  Reply
Nabeel says:

November 27, 2012 at 5:33 pm

@caustik – Thanks, makes sense. So traversal for sending responses are much faster, also any lookups of specific nodes to send targeted messages would be much faster. Something I didn’t think of

LikeLike

Reply
drcyrus3d says:

November 28, 2012 at 7:43 am

Awesome work caustik!!! Thank you for sharing.

LikeLike

Reply
Aaron Wang says:

January 1, 2013 at 5:42 pm

Great post!!! Very helpful to my work I’m trying to do!!!
Would you take a look at my question on stackoverflow http://stackoverflow.com/questions/14049109/long-connections-with-node-js-how-to-reduce-memory-usage-and-prevent-memory-lea, please? Hoping for your advise, thanks.

LikeLike

Reply
apenedo says:

January 31, 2013 at 9:25 am

I’m really interested in this topic. I’m are trying to emulate something similar but using SSE. I’m trying to stress test a server to see how many concurrent SSE streams can it hold open.

Is there a change to see the code for the client side?

The server listening on port 8080 so the client can request a session id and then open the SSE stream on port 8081 where the workers are listening. The problem is that the client doesn’t seem to be able to open the socket on that port. I’m still struggling to find how you manage the connections on both ports.

As a single thread application, with no workers listening and on a single port, it works fine though.

Thanks.

LikeLike

Reply
abraham says:

March 28, 2013 at 1:40 pm

Hello!

I’m trying to use your server file to create my own server to deliver push notifiations. But I have a problem. I can sent the messages and received them, but I don’t know how can I used them in the client. I mean, If I point to server:8080 the page keeps loading and I can see how it receives the messages in firebug, but I have no idea how to add a listener to it. Should it be long-poll-ajax?

LikeLike

Reply
1. caustik says:
  
  March 28, 2013 at 5:29 pm
  
  yea it’s long poll ajax. sprites client uses libcurl in C++
  
  LikeLike
  
  Reply
abraham says:

April 3, 2013 at 10:06 am

Do you have any personal email? I wish to have a litle chat with you about this…

Thanks a lot!

LikeLike

Reply
Nalum says:

April 29, 2013 at 1:32 am

Just wondering, if you don’t mind, what the cost of this little test was for you?

I’m looking at websockets, using python at the moment, possibly moving away for it though, and would be interested in hearing more about this.

LikeLike

Reply
1. caustik says:
  
  May 3, 2013 at 8:41 am
  
  I don’t remember the exact cost, but it was a few hundred USD, I think. It may be different pricing by now and you can also reduce costs by quite a bit by using Spot Instances on Amazon, for example.
  
  LikeLike
  
  Reply
Murat says:

June 20, 2013 at 12:28 am

Where can we find client code? did you use ab or jmeter for 1m connection.
Thanks for reply

LikeLike

Reply
Lakano says:

July 15, 2013 at 6:34 am

Good job Caustik !

Shahzad Bhatti have published a very interesting tests about connections with NodeJS and Vert.x

He push until 24.000 connected clients on NodeJS / Vert.x and we could see NodeJS became really slow to receive messages in theses conditions (but Vert.x seems stable).

http://weblog.plexobject.com/?p=1698

So, when I see you are capable to push limits to 1M, I’m not sure it’s possible for NodeJS to really works correctly in theses conditions ?

This could be really interesting if you could make this kind of tests !

Or better, if you could compare with Vert.x 🙂

LikeLike

Reply
Pingback: node.js – do you know? | Prowareness Developer's Blog
Pingback: 深入浅出node.js游戏服务器开发 – html5game
Pingback: 为什么我喜欢NodeJS – 码农网
Pingback: Why The Hell Would I Use Node.js? A Case-by-Case Introduction | Coding Storage
Pingback: Choosing web chat/notification technology [on hold] - node.js Solutions - Developers Q & A
Pingback: Why The Hell Would I Use Node.js | Aytek's Free Zone
perf guy says:

February 4, 2014 at 10:57 pm

Late for this exciting discussion. Just want to point out that it’s fairly easy to generate 1,000,000 concurrent connection from one instance. No need to have a large number of loadGen instances, which can be a headache to manage. http://perftestingng.blogspot.com/2014/02/1-million-concurrent-connections-from.html

LikeLike

Reply
Pingback: Building a Sinkhole That Never Clogs
perf guy says:

March 8, 2014 at 4:02 pm

Agree with the above note that “a high number of simultaneous clients has been achieved on different platforms such as Node”. Here is the difference: in many cases, tester engineers will be assigned to create test code to emulate the complex interaction between client and server (to load test the high performance server). So the test platform needs to make it easy to develop the high performance test scripts that don’t have call-backs, too much syntactic sugar (image you have to teach someone “public static” function).

Sorry it’s a little deviated from the main topic in this blog (thanks for it!) even though it’s related.

LikeLike

Reply
Sai says:

May 13, 2014 at 5:33 am

Hi,
I have a 32 core system with 48GB, what should be my sysctl.conf settings, if there is a logic behind it, please explain.

Thanks
Sai

LikeLike

Reply
Pingback: Interview: In the Loop -Stackato, a Platform-as-a-Service That You Can Deploy and Manage Yourself | Eclectic Consulting
Pingback: Why The Hell Would I Use Node.js? A Case-by-Case Introduction | KxDian.Com
Jordan Coeyman says:

July 12, 2014 at 8:11 am

Is it outrageous to ask for updates on these benchmarks with Socket.io 1.0 now released?

LikeLike

Reply
yujiao says:

August 18, 2014 at 3:05 pm

Hi,

I am running 24k connections, and after that my clients get disconnecting events…I increase emphemeral port range, I don’t think it is the 3 simulation clients problem. Have absolutely no way of knowing what’s going on with the disconnecting event…

You happen to know anything about this?

Thanks!

LikeLike

Reply