Apache Kafka is a popular messaging middleware, centered around the concept of topics and partitions. In this article we make a parallel between CoralSequencer and Kafka to understand how these two very different systems go about solving the same classical message distribution problem.
IMPORTANT: We recognize the importance and benefits that Apache Kafka brings to a lot of customers. This article is not intended to downplay Kafka’s role in the market. For most companies, Kafka does the job, and does it very well. The goal of this article is to demonstrate to a small subset of customers, that needs distributed systems with ultra-low-latency, zero-GC and high-availability through determinism, a different paradigm (sequencer architecture), which has been used successfully by the financial market (largest exchanges and top market makers) for decades.
How Apache Kafka distributes messages?
This short video is very informative about how Kafka works. Below a screenshot from the video describing how Kafka distributes messages through topics and partitions:
The main takeaways from the Kafka architecture, in comparison with CoralSequencer, are:
- Kafka does not guarantee the ordering of messages between partitions. So if you have one topic, broken down in 3 partitions, and you want a single consumer reading the whole topic, determinism is lost. For example, if you start the same consumer twice, for two separate runs on reading all the topic messages, on each run it will read the messages in a different/unpredictable order. CoralSequencer on the other hand is grounded on the premise that all consumers read the exact same messages in the exact same order, always.
- The application logic naturally imposes a limit to the number of partitions that a topic can have because some application messages must be in the same partition for ordering and can’t be separated in different partitions. CoralSequencer on the other hand can keep adding as many consumers as you wish in order to distribute the load because the ordering of the messages is always the same.
- Kafka can only use TCP to distribute messages. TCP has two known problems for broadcasting messages:
- Fan-out: If the server has 100 connected consumers then it must maintain 100 different socket channels and each message will have to be written 100 times, one for each socket channel.
- Fairness: Because the server is sending the same message repeatedly to multiple consumers, the first consumer will get the message very early while the last consumer will get the message very late.
- Kafka enforces the distribution of messages at message creation time, in the producer side, because every message must have a partition key in order for its partition to be determined. CoralSequencer on the other hand simply pushes all messages to all consumers, and the consumer is the one who determines if a message is for him or not, quickly dropping/ignoring messages it is not interested in.
- Kafka is multithreaded by design. CoralSequencer is single-threaded by design, everything runs on a pinned high-performance thread. There are never race conditions to worry about or lock contention to introduce latency.
- Kafka sockets are blocking sockets. CoralSequencer sockets (TCP and UDP) are non-blocking by design, which minimizes latency and maximizes throughput.
- Kafka Java implementation is GC-intensive, producing a lot of garbage for the GC. CoralSequencer is zero-GC, producing zero garbage for the GC per message. With CoralSequencer you can send billions of messages without ever seeing any GC activity.
- Kafka has historically relied on Zookeeper for high-availability. Because CoralSequencer is a fully deterministic system, high-availability is built natively on its core. With CoralSequencer it is straightforward to create perfect clusters that can provide high-availability, failover and load-balance. There is never a single point of failure.
- Kafka depends on a plethora of third-party libraries, like Log4J which recently had a very critical security flaw. At CoralBlocks, we use Java as a syntax language, in other words, our systems do not depend on any external third-party libraries. Everything is implemented from scratch, including some JDK libraries. This is done for performance, Zero-GC and security. And also for total control over the critical path.
In addition to TCP, CoralSequencer also supports Multicast UDP for message distribution, optimizing fan-out and enforcing fairness at the hardware-level (i.e. network switches). Moreover, for ultra-low-latency, CoralSequencer also supports shared-memory for message distribution on the same machine.
How CoralSequencer distributes messages?
All consumers get all messages from the topic in the exact same order, always.
Won’t a consumer waste time by consuming messages it is not interested in?
CoralSequencer is capable of pushing more than 1 million messages per second to a single consumer. The consumer will be able to receive and discard messages it is not interested in very quickly.
How do I create more topics?
You simply run another sequencer in parallel.
Can the same consumer read from two topics at the same time?