CoralSequencer Performance Numbers

In this article we will present the latency and throughput numbers of CoralSequencer. In doing that we compute the time it takes for a node to publish a message in the sequencer and receive the response in the event-stream, in other words, we are measuring network round-trip times. There are two independent JVMs: one running the sequencer and one running the benchmark node. The node sends the message to the sequencer, the sequencer picks up the message and publishes a response in the event-stream so that the node can get it. CoralSequencer comes with a BenchmarkNode implementation that you can use to measure the throughput and latency in your own environment.

The machine used for the latency benchmarks below was a fast Intel Xeon E-2288G octa-core (8 x 3.70GHz) Ubuntu box not overclocked.

NOTE: Everyone’s network environment is different, and we usually have a hard time comparing over-the-wire benchmark numbers. To make this simple we present loopback numbers (i.e. client and server running on the same physical machine but different JVMs) which are easy to compare and weed out external factors, isolating the performance of the application + network code. To calculate total numbers you should add your typical over-the-wire network latency. A 256-byte packet traveling through a 10 Gigabits ethernet will take at least 382 nanoseconds to go from NIC to NIC (ignoring the switch hop). If your ethernet is 1 Gigabits then the latency is at least 3.82 micros on top of CoralSequencer numbers. Another factor is the network card latency. Going from JVM to kernel to NIC can be costly and some good network cards optimize that by offering kernel bypass (i.e. Open OnLoad from SolarFlare).

Latency Numbers

Message Size: 256 bytes
Messages: 1,000,000
Avg Time: 4.771 micros
Min Time: 3.64 micros
Max Time: 616.274 micros
75% = [avg: 4.563 micros, max: 4.876 micros]
90% = [avg: 4.621 micros, max: 4.95 micros]
99% = [avg: 4.666 micros, max: 5.963 micros]
99.9% = [avg: 4.68 micros, max: 6.958 micros]
99.99% = [avg: 4.736 micros, max: 279.053 micros]
99.999% = [avg: 4.766 micros, max: 485.136 micros]


Throughput Numbers

The machine used for the latency benchmarks below was a fast Intel Xeon E-2288G octa-core (8 x 3.70GHz) Ubuntu box not overclocked.

The throughput numbers below are measured when three nodes are pushing messages to the sequencer as fast as they can. How many messages can the sequencer receive and send out?

Message Size: 256 bytes
Messages Sent: 3,000,000
Total Time: 2.688 secs
Messages per second: 1,116,166
Average Time per message: 895 nanos

The three nodes used to stress out the sequencer had throughput numbers close to:

Message Size: 256 bytes
Messages Sent: 2,000,000
Messages per second: 470,483
Average Time per message: 2.125 micros

NOTE ABOUT BATCHING: The throughput numbers above were calculated with the nodes aggressively batching, in other words, the nodes were aggressively sending more than one message inside the same UDP packet. If you assume that the MTU (i.e. max UDP transmission unit size) is around 1500 bytes, then you can fit five 256-byte messages inside the same UDP packet, increasing throughput considerably.

If you decrease the message size, say to 64 bytes, then you will be able to batch even more, increasing the throughput numbers even further. Below the sequencer throughput numbers when the nodes are aggressively batching 64-byte messages inside the same UDP packet:

Message Size: 64 bytes
Messages Sent: 3,000,000
Total Time: 1.287 secs
Messages per second: 2,330,160
Average Time per message: 429 nanos

The three nodes used to stress out the sequencer had throughput numbers close to:

Message Size: 64 bytes
Messages Sent: 2,000,000
Messages per second: 1,641,239
Average Time per message: 609 micros

For completeness, below we present the sequencer throughput numbers when no batching at the node level is taking place, in other words, when the nodes are sending only one 256-byte message per UDP packet.

Message Size: 256 bytes
Messages Sent: 2,000,000
Total Time: 2.543 secs
Messages per second: 786,497
Average Time per message: 1.271 micros

The three nodes used to stress out the sequencer had throughput numbers close to:

Message Size: 256 bytes
Messages Sent: 2,000,000
Messages per second: 275,861
Average Time per message: 3.625 micros


Conclusion

CoralSequencer can sustain a throughput of 2 million messages per second if batching is used. The round-trip latencies are close to 4.7 micros per message (256 bytes). Without batching at the node level, the sequencer throughput number is around 780 thousand messages per second.