CoralSequencer Performance Numbers

In this article we will present the latency and throughput numbers of CoralSequencer. In doing that we compute the time it takes for a node to publish a message in the sequencer and receive the response in the event-stream, in other words, we are measuring network round-trip times. There are two independent JVMs: one running the sequencer and one running the benchmark node. The node sends the message to the sequencer, the sequencer picks up the message and publishes a response in the event-stream so that the node can get it. CoralSequencer comes with a BenchmarkNode implementation that you can use to measure the throughput and latency in your own environment.


The machine used for the latency benchmarks below was a fast Intel 13th Generation Core i9-13900KS (8 x 3.20GHz Base / 6.00GHz Turbo) Ubuntu box.

NOTE: Everyone’s network environment is different, and we usually have a hard time comparing over-the-wire benchmark numbers. To make this simple we present loopback numbers (i.e. client and server running on the same physical machine but different JVMs) which are easy to compare and weed out external factors, isolating the performance of the application + network code. To calculate total numbers you should add your typical over-the-wire network latency. A 256-byte packet traveling through a 10 Gigabits ethernet will take at least 382 nanoseconds to go from NIC to NIC (ignoring the switch hop). If your ethernet is 1 Gigabits then the latency is at least 3.82 micros on top of CoralSequencer numbers. Another factor is the network card latency. Going from JVM to kernel to NIC can be costly and some good network cards optimize that by offering kernel bypass (i.e. Open OnLoad from SolarFlare).

Latency Numbers

Message Size: 1024 bytes
Messages: 1,000,000
Avg Time: 3.379 micros
Min Time: 2.717 micros
Max Time: 73.174 micros
75% = [avg: 3.236 micros, max: 3.428 micros]
90% = [avg: 3.286 micros, max: 3.775 micros]
99% = [avg: 3.359 micros, max: 5.155 micros]
99.9% = [avg: 3.376 micros, max: 5.547 micros]
99.99% = [avg: 3.378 micros, max: 5.719 micros]
99.999% = [avg: 3.378 micros, max: 11.311 micros]


Throughput Numbers

The machine used for the throughput benchmarks below was a fast Intel Xeon E-2288G octa-core (8 x 3.70GHz) Ubuntu box not overclocked.

The throughput numbers below are measured when three nodes are pushing messages to the sequencer as fast as they can. How many messages can the sequencer receive and send out?

Message Size: 256 bytes
Messages Sent: 3,000,000
Total Time: 2.688 secs
Messages per second: 1,116,166
Average Time per message: 895 nanos

The three nodes used to stress out the sequencer had throughput numbers close to:

Message Size: 256 bytes
Messages Sent: 2,000,000
Messages per second: 470,483
Average Time per message: 2.125 micros

NOTE ABOUT BATCHING: The throughput numbers above were calculated with the nodes aggressively batching, in other words, the nodes were aggressively sending more than one message inside the same UDP packet. If you assume that the MTU (i.e. max UDP transmission unit size) is around 1500 bytes, then you can fit five 256-byte messages inside the same UDP packet, increasing throughput considerably.

If you decrease the message size, say to 64 bytes, then you will be able to batch even more, increasing the throughput numbers even further. Below the sequencer throughput numbers when the nodes are aggressively batching 64-byte messages inside the same UDP packet:

Message Size: 64 bytes
Messages Sent: 3,000,000
Total Time: 1.287 secs
Messages per second: 2,330,160
Average Time per message: 429 nanos

The three nodes used to stress out the sequencer had throughput numbers close to:

Message Size: 64 bytes
Messages Sent: 2,000,000
Messages per second: 1,641,239
Average Time per message: 609 micros

Below we present the sequencer throughput numbers when no batching at the node level is taking place, in other words, when the nodes are sending only one 256-byte message per UDP packet.

Message Size: 256 bytes
Messages Sent: 2,000,000
Total Time: 2.543 secs
Messages per second: 786,497
Average Time per message: 1.271 micros

The three nodes used to stress out the sequencer had throughput numbers close to:

Message Size: 256 bytes
Messages Sent: 2,000,000
Messages per second: 275,861
Average Time per message: 3.625 micros

The worst case scenario is when we have a big message filling up the entire UDP packet. Note that with this big message size not even the sequencer will be able to batch anything. Below the throughput numbers when the nodes are sending a 1400-byte message.

Message Size: 1,400 bytes
Messages Sent: 2,000,000
Total Time: 6.097 secs
Messages per second: 328,025
Average Time per message: 3.048 micros

The three nodes used to stress out the sequencer had throughput numbers close to:

Message Size: 1,400 bytes
Messages Sent: 2,000,000
Messages per second: 111,119
Average Time per message: 8.999 micros


Conclusion

CoralSequencer can sustain a throughput of 2 million messages per second if batching is used. The round-trip latencies are close to 4.7 micros per message (256 bytes). Without batching at the node level, the sequencer throughput number is around 780 thousand messages per second for a 256-byte message. For the worst case scenario, a 1400-byte message inside the UDP packet, the sequencer throughput number is around 320 thousand messages per second.