caustik's blog

programming and music

Node.js w/1M concurrent connections!

with 60 comments

I’ve decided to ramp up the Node.js experiments, and pass the 1 million concurrent connections milestone. It worked, using a swarm of 500 Amazon EC2 test clients, each establishing ~2000 active long-poll COMET connections to a single 15GB rackspace cloud server.

This isn’t landing the mars rover, or curing cancer. It’s just a pretty cool milestone IMO, which may help a few people who want to use Node.js for a large number of concurrent connections. So, hopefully it’s of some benefit to a few Node developers who can use these settings as a starting point in their own projects.

Here’s the connection count as displayed on the sprite’s page:

Here’s a sysctl dumping the number of open file handles (sockets are file handles):

Here’s the view of “top” showing system resources in use:

I think it’s pretty reasonable for 1M connections to consume 16GB of memory, but it could probably be trimmed down quite a bit. I haven’t spent any time optimizing that. I’ll leave that for another day.

Here’s a latency test run against the comet URL:

The new tweaks, placed in /etc/sysctl.conf (CentOS) and then reloaded with “sysctl -p” :

net.core.rmem_max = 33554432
net.core.wmem_max = 33554432
net.ipv4.tcp_rmem = 4096 16384 33554432
net.ipv4.tcp_wmem = 4096 16384 33554432
net.ipv4.tcp_mem = 786432 1048576 26777216
net.ipv4.tcp_max_tw_buckets = 360000
net.core.netdev_max_backlog = 2500
vm.min_free_kbytes = 65536
vm.swappiness = 0
net.ipv4.ip_local_port_range = 1024 65535

Other than that, the steps were identical to the steps described in my previous blog posts, except this time using Node.js version 0.8.3.

Here is the server source code, so you can get a sense of the complexity level. Each connected client is actively sending messages, for the purpose of verifying the connections are alive. I haven’t pushed that throughput yet, to see what data rate can be sent. Since the modest 16GB memory was consumed, this would have likely caused swapping and meant little. I’ll give it a shot with a higher memory server next time.

Escape the 1.4GB V8 heap limit in Node.jsSprites Project

Written by caustik

August 19th, 2012 at 12:43 am

60 Responses to 'Node.js w/1M concurrent connections!'

Subscribe to comments with RSS or TrackBack to 'Node.js w/1M concurrent connections!'.

  1. I’m guessing you mean the EC2 swarm were actually test clients connecting to a single node server? It’s kinda ambiguous – you might wanna expand on that a little.

    Michael Hart

    19 Aug 12 at 2:35 am

  2. Good point, updated the post to clarify that a bit.

    caustik

    19 Aug 12 at 2:43 am

  3. Perfect – that makes sense. Awesome job pushing it to the limit by the way!

    Michael Hart

    19 Aug 12 at 6:25 am

  4. How significant was the app that was running? Just a simple Hello World?

    Adam Duro

    19 Aug 12 at 5:03 pm

  5. Adam – it’s of moderate complexity.

    caustik

    19 Aug 12 at 5:37 pm

  6. Why in the world did you need 500 EC2 instances for client connections? With the right settings, you should be able to get to 1M with about 20 instances.

    Tim McClarren

    19 Aug 12 at 6:17 pm

  7. Well played, sir. I will look into your tweaks some more. At some point the project I’m bootstrapping with a couple partners may need your consulting services. Cheers.

    Adam Duro

    19 Aug 12 at 6:38 pm

  8. Tim – Let me know when you’ve pulled that off, using EC2 instances, and the overall cost was less than using 500 micro spot instances. Also, doing it this way is more realistic – why would you complain that -too many- unique servers were used, when it’s meant to simulate the real world scenario of each client being a unique IP? I’m just as likely to get a comment asking why I didn’t use -more- unique instances ;) — anyway, my point is it’s not really as brain dead of a choice as you might think. I’m erring on the side of a more expensive, but more realistic, testing procedure.

    caustik

    19 Aug 12 at 6:39 pm

  9. What instance type did you use for the server?

    Michael Hood

    19 Aug 12 at 8:08 pm

  10. The server was a 15GB rackspace cloud server.

    caustik

    19 Aug 12 at 8:12 pm

  11. So we meet again…. caustik!

    Is your main goal to scale out vertically with all this? Or just to see how far you can take it?

    jtarchie

    20 Aug 12 at 7:11 am

  12. Just saw some limitations and wanted to find a way around them.

    Also, it seems to really upset anti-Node web zealots, trolls, and smug redditors, so that’s pretty satisfying.

    caustik

    20 Aug 12 at 8:25 am

  13. Great work: Now the outcome of a similar test for a more common Node.js stack (node.js – connect / express – socket.io / web socket connections) would be interesting.

    I’ll have a look at your setup and check if such a test can be created that allows comparing results to your simulation outcome.

    fpp

    20 Aug 12 at 12:22 pm

  14. Adding more inbound unique IPs into the mix isn’t testing anything further, unless you believe your TCP stack to be broken in some fashion.

    Tim McClarren

    20 Aug 12 at 8:44 pm

  15. The test client references regplat.h, is there a way we can use your test client or get that library?

    John Adams

    21 Aug 12 at 9:33 pm

  16. The svn:external should work for read only access. It’s the main include file for a utility library.

    caustik

    21 Aug 12 at 10:11 pm

  17. Nice way to use 500 EC2 Instances to generate such large volume of requests and real world scenario. This will also get one past the I/O and bandwidth throttle that AWS imposes on an Instance. Can you tell me which load generation (client) software did you use? And did you work with AWS before the test? Like filling out the penetration test form?

    Raghu

    23 Aug 12 at 11:13 am

  18. Awesome work!
    What did you use to test the latency shown in the picture? http://www.caustik.com/blog/1M-latency1.PNG
    Cheers!

    Daniel

    25 Aug 12 at 8:28 am

  19. I just searched for “latency test” and used one of those :]

    caustik

    25 Aug 12 at 7:22 pm

  20. I don’t see the images, they are throwing 404s

    arnabc

    26 Aug 12 at 6:59 am

  21. Thanks for letting me know, it’s fixed now. Just migrated servers today, had a conflicting apache directive!

    caustik

    26 Aug 12 at 7:35 am

  22. Yes, images are back thank you for such a quick response.

    “Also, it seems to really upset anti-Node web zealots, trolls, and smug redditors, so that’s pretty satisfying.”

    I liked this comment as this issue annoys me too, especially in Reddit.

    arnabc

    26 Aug 12 at 9:42 am

  23. I am trying to reproduce your tests (which are very interesting) and I have run into a roadblock.

    My tests differ from your because I am running the node.js server on a 16GB m1.XL at EC2. And I am trying to run m1.L for the clients.

    The problem I appears on the server where once I reach about 150000 connections node/V8 keeps trying to Scavenge memory. Doing this causes my client connections to time out. Did you have this problem at all?

    Thanks.

    ccarollo

    27 Aug 12 at 9:28 pm

  24. ccarollo – I did run into the same problem, it was talked about in this other blog post escape the 1.4gb v8 heap limit in node.js. Hope that helps.

    caustik

    27 Aug 12 at 9:49 pm

  25. Caustik, I did read that and I have been using the ulimit change as well as the following command for running the script.

    node –trace-gc –expose-gc –nouse-idle-notification –max-new-space-size=2048 –max-old-space-size=14336

    Are you saying that you needed to custom build V8 with those two changes to get up to 1M connections?

    ccarollo

    27 Aug 12 at 10:31 pm

  26. Yea, the V8 source code changes are needed. It would be nice if future revisions of V8 would allow you more control over heap size / garbage collection.

    caustik

    27 Aug 12 at 11:57 pm

  27. Have you tried this test on any of the v0.8.X releases? I forgot to mention that earlier, but I am trying it on v0.8.8 and I have also added the v8 settings, but I still get the same behavior.

    Here is an example of the output….

    313938 ms: Scavenge 850.2 (900.0) -> 849.6 (900.0) MB, 1 ms [Runtime::PerformGC].
    316877 ms: Scavenge 850.4 (900.0) -> 849.6 (900.0) MB, 1 ms [Runtime::PerformGC].
    320518 ms: Scavenge 850.6 (900.0) -> 849.6 (900.0) MB, 0 ms [Runtime::PerformGC].
    324514 ms: Scavenge 850.6 (900.0) -> 849.6 (900.0) MB, 0 ms [Runtime::PerformGC].
    328120 ms: Scavenge 850.6 (900.0) -> 849.6 (900.0) MB, 0 ms [Runtime::PerformGC].
    331594 ms: Scavenge 850.6 (900.0) -> 849.6 (900.0) MB, 0 ms [allocation failure].
    bdceb1f0-f0a7-11e1-8b35-4936f06b29eb
    335123 ms: Scavenge 850.6 (900.0) -> 849.6 (900.0) MB, 0 ms [allocation failure].
    338720 ms: Scavenge 850.6 (900.0) -> 849.6 (900.0) MB, 0 ms [Runtime::PerformGC].
    342342 ms: Scavenge 850.6 (900.0) -> 849.6 (900.0) MB, 0 ms [allocation failure].
    345963 ms: Scavenge 850.6 (900.0) -> 849.6 (900.0) MB, 0 ms [Runtime::PerformGC].

    ccarollo

    28 Aug 12 at 12:33 am

  28. I think those scavenges are fine (0ms), those aren’t the disruptive GC cycles.

    Sorry, it took me a while to respond. Hope it’s working for you now.

    caustik

    31 Aug 12 at 8:07 pm

  29. arnabc – the redditor hate is even better now that their own website couldn’t handle 5M requests per hour (not sure if it was over 1M concurrent at all during that time). It would be a nice test to replicate the stats that took down reddit, and show it being handled by a single Node.js server. Of course there’s complexity to the reddit back-end, for all the various site features, but it’s still fundamentally a connected graph structure like Sprites is.

    caustik

    31 Aug 12 at 8:12 pm

  30. [...] Node.js w/1M concurrent connections! [...]

  31. how many front-end IP addresses or how do you Load Balanced requests? Nice job by the way! ;)

    djeps

    3 Sep 12 at 6:41 pm

  32. Just 1 IP, no load balancer at all.

    caustik

    3 Sep 12 at 9:36 pm

  33. amazing.
    what would the configuration options be to increase the udp performance in the same way? currently max on our servers is 15k msg/sec, which is no joke but am certain can improve.

    midletearth

    10 Sep 12 at 11:13 pm

  34. Hi!
    How to use 8 GB memory?

    My test:

    var ph = [];
    while (true) {
    ph.push(’7232985jkdjf’);
    }

    node –trace-gc –max-old-space-size=8192 heaptest.js
    31 ms: Scavenge 2.1 (35.0) -> 1.8 (36.0) MB, 0 ms [allocation failure].
    33 ms: Scavenge 2.6 (36.0) -> 2.4 (36.0) MB, 0 ms [Runtime::PerformGC].
    ……..
    3108 ms: Mark-sweep 712.2 (746.7) -> 428.0 (462.4) MB, 504 ms [Runtime::PerformGC] [GC in old space requested].
    4641 ms: Mark-sweep 1067.6 (1102.0) -> 641.2 (675.6) MB, 752 ms [Runtime::PerformGC] [GC in old space requested].
    FATAL ERROR: JS Allocation failed – process out of memory

    ubuntu server x64
    node -p -e “process.arch” >> x64
    node -v >> v0.8.9

    krowten

    21 Sep 12 at 12:39 am

  35. Can you explain some of your server code; just the reasoning for using nodes? Is that just to re-use ID’s? Any reason to not just dump all the connections in an array? And maybe a queue with redis or something for push/pop IDs?

    Other than that, the server code seems pretty standard; the kernel tweaks are what IMO is more valuable.

    Nabeel

    27 Nov 12 at 12:20 pm

  36. I decided to use nodes because each connection needs to efficiently find out who it’s neighbors are. If each node knew it’s neighbors only by ID, as opposed to holding a direct reference, each traversal step would require a look-up into an associative array. For finding the neighbor 2 positions to your left and 2 positions up, for example, a graph structure is just a little easier to work with IMO (e.g. pNode->pLeft->pLeft->pTop->pTop). Also, although chromium’s associative arrays may be O(1) complexity, they involve the overhead of a hash function and potential for collisions, which when dealing with hundreds of thousands of connections, add up enough to be a prohibiting performance bottleneck.

    caustik

    27 Nov 12 at 2:46 pm

  37. @caustik – Thanks, makes sense. So traversal for sending responses are much faster, also any lookups of specific nodes to send targeted messages would be much faster. Something I didn’t think of

    Nabeel

    27 Nov 12 at 5:33 pm

  38. Awesome work caustik!!! Thank you for sharing.

    drcyrus3d

    28 Nov 12 at 7:43 am

  39. Great post!!! Very helpful to my work I’m trying to do!!!
    Would you take a look at my question on stackoverflow http://stackoverflow.com/questions/14049109/long-connections-with-node-js-how-to-reduce-memory-usage-and-prevent-memory-lea, please? Hoping for your advise, thanks.

    Aaron Wang

    1 Jan 13 at 5:42 pm

  40. I’m really interested in this topic. I’m are trying to emulate something similar but using SSE. I’m trying to stress test a server to see how many concurrent SSE streams can it hold open.

    Is there a change to see the code for the client side?

    The server listening on port 8080 so the client can request a session id and then open the SSE stream on port 8081 where the workers are listening. The problem is that the client doesn’t seem to be able to open the socket on that port. I’m still struggling to find how you manage the connections on both ports.

    As a single thread application, with no workers listening and on a single port, it works fine though.

    Thanks.

    apenedo

    31 Jan 13 at 9:25 am

  41. Hello!

    I’m trying to use your server file to create my own server to deliver push notifiations. But I have a problem. I can sent the messages and received them, but I don’t know how can I used them in the client. I mean, If I point to server:8080 the page keeps loading and I can see how it receives the messages in firebug, but I have no idea how to add a listener to it. Should it be long-poll-ajax?

    abraham

    28 Mar 13 at 1:40 pm

  42. yea it’s long poll ajax. sprites client uses libcurl in C++

    caustik

    28 Mar 13 at 5:29 pm

  43. Do you have any personal email? I wish to have a litle chat with you about this…

    Thanks a lot!

    abraham

    3 Apr 13 at 10:06 am

  44. Just wondering, if you don’t mind, what the cost of this little test was for you?

    I’m looking at websockets, using python at the moment, possibly moving away for it though, and would be interested in hearing more about this.

    Nalum

    29 Apr 13 at 1:32 am

  45. I don’t remember the exact cost, but it was a few hundred USD, I think. It may be different pricing by now and you can also reduce costs by quite a bit by using Spot Instances on Amazon, for example.

    caustik

    3 May 13 at 8:41 am

  46. Where can we find client code? did you use ab or jmeter for 1m connection.
    Thanks for reply

    Murat

    20 Jun 13 at 12:28 am

  47. Good job Caustik !

    Shahzad Bhatti have published a very interesting tests about connections with NodeJS and Vert.x

    He push until 24.000 connected clients on NodeJS / Vert.x and we could see NodeJS became really slow to receive messages in theses conditions (but Vert.x seems stable).

    http://weblog.plexobject.com/?p=1698

    So, when I see you are capable to push limits to 1M, I’m not sure it’s possible for NodeJS to really works correctly in theses conditions ?

    This could be really interesting if you could make this kind of tests !

    Or better, if you could compare with Vert.x :)

    Lakano

    15 Jul 13 at 6:34 am

  48. […] g) One performance test result on a http server with node.js – 1M concurrent connections requests on single 15GB rackspace cloud server – source (http://blog.caustik.com/2012/08/19/node-js-w1m-concurrent-connections/) […]

  49. […] 最近微博上看到时go的能撑起100万的并发连接,node.js也能达到同样的数据, Node.js w/1M concurrent connections!有node.js的长连接数据,它占用了16G内存,但CPU还远没跑满。 […]

  50. […] A quick calculation: assuming that each thread potentially has an accompanying 2 MB of memory with it, running on a system with 8 GB of RAM puts us at a theoretical maximum of 4000 concurrent connections, plus the cost of context-switching between threads. That’s the scenario you typically deal with in traditional web-serving techniques. By avoiding all that, Node.js achieves scalability levels of over 1M concurrent connections (as a proof-of-concept). […]

  51. […] A quick calculation: assuming that each thread potentially has an accompanying 2 MB of memory with it, running on a system with 8 GB of RAM puts us at a theoretical maximum of 4000 concurrent connections, plus the cost of context-switching between threads. That’s the scenario you typically deal with in traditional web-serving techniques. By avoiding all that, Node.js achieves scalability levels of over 1M concurrent connections (as a proof-of-concept). […]

  52. […] a load of 10000 concurrent users basically chatting to one another, then Node should suffice (see http://blog.caustik.com/2012/08/19/node-js-w1m-concurrent-connections/). The thing with eJabberd is that it comes with additional modules (Pubsub, roster management, […]

  53. […] A quick calculation: assuming that each thread potentially has an accompanying 2 MB of memory with it, running on a system with 8 GB of RAM puts us at a theoretical maximum of 4000 concurrent connections, plus the cost of context-switching between threads. That’s the scenario you typically deal with in traditional web-serving techniques. By avoiding all that, Node.js achieves scalability levels of over 1M concurrent connections (as a proof-of-concept). […]

  54. Late for this exciting discussion. Just want to point out that it’s fairly easy to generate 1,000,000 concurrent connection from one instance. No need to have a large number of loadGen instances, which can be a headache to manage. http://perftestingng.blogspot.com/2014/02/1-million-concurrent-connections-from.html

    perf guy

    4 Feb 14 at 10:57 pm

  55. […] techniques, a high number of simultaneous clients has been achieved on different platforms such as Node.JS, Clojure or […]

  56. Agree with the above note that “a high number of simultaneous clients has been achieved on different platforms such as Node”. Here is the difference: in many cases, tester engineers will be assigned to create test code to emulate the complex interaction between client and server (to load test the high performance server). So the test platform needs to make it easy to develop the high performance test scripts that don’t have call-backs, too much syntactic sugar (image you have to teach someone “public static” function).

    Sorry it’s a little deviated from the main topic in this blog (thanks for it!) even though it’s related.

    perf guy

    8 Mar 14 at 4:02 pm

  57. Hi,
    I have a 32 core system with 48GB, what should be my sysctl.conf settings, if there is a logic behind it, please explain.

    Thanks
    Sai

    Sai

    13 May 14 at 5:33 am

  58. […] across multiple processes. We have also made kernel tweaks and tuned TCP parameters similar to this to squeeze as much performance as possible out of […]

  59. […] A quick calculation: assuming that each thread potentially has an accompanying 2 MB of memory with it, running on a system with 8 GB of RAM puts us at a theoretical maximum of 4000 concurrent connections, plus the cost of context-switching between threads. That’s the scenario you typically deal with in traditional web-serving techniques. By avoiding all that, Node.js achieves scalability levels of over 1M concurrent connections (as a proof-of-concept). […]

  60. Is it outrageous to ask for updates on these benchmarks with Socket.io 1.0 now released?

    Jordan Coeyman

    12 Jul 14 at 8:11 am

Leave a Reply