Achieving Exactly-Once Message Delivery Guarantees with Sequence Numbers

In this article we propose a solution to the difficult problem of exactly-once message delivery, consumption and processing by a non-fixed set of consumers. We start by defining very clearly the problem we are trying to solve.

The Problem

  • Let’s assume we have 1 million messages sitting on an in-memory queue inside a broker middleware.
  • This broker has a TCP server that can accept TCP connections from clients, which we call TCP consumers.
  • As TCP consumers connect, they start receiving messages from the in-memory queue.
  • The goal is for each consumer to consume and process different messages, in other words, two different consumers must never consume and process the same message.
  • Additionally, the same consumer must never consume and process the same message twice.
  • Last but not least, no message can ever be lost, in other words, all messages must be processed by a consumer.

  • To illustrate the scenario and the requirements clearly: If all consumers are simply inserting the messages into a database table without any unique index restrictions (i.e. it totally allows duplicate inserts), the end-result must be all 1 million messages from the broker inserted into the database table, without any duplicates.

    Caveat #1

    The first catch has to do with ordering. To attain such a system in a way that it becomes practical from a performance point-of-view, the order of the messages must be forsaken. The end-result will be 1 million messages in the database, but the ordering of those messages cannot be known or enforced, and will not be the same order that they were in the broker queue. This is usually not a problem if you are just trying to distributed load across consumers. It can be a problem when some messages need to be processed by the same consumer, but that’s a feature our middleware is able to provide. It is important to note that order is guaranteed over the same consumer, in other words, the same consumer will never process messages out of order. The same cannot be attained for different consumers, in other words, message 345 from the broker can be consumed by consumer A after message 346 from the broker has already being consumed by consumer B.

    Sequence Numbers for the Rescue

    The 1 million messages will be distributed in a round-robin fashion by the broker to the connected TCP consumers using a sequenced protocol. That means that each and every message forwarded to that consumer by the broker will receive an incremental and unique sequence number (see diagram below). The consumer will always know what sequences it has already processed (and committed) so that it will never process the same message twice. Messages will never be lost because the connection is over TCP and because in the event of a TCP disconnect/reconnect the consumer will inform the broker the last sequence number it has processed.

    FAQ – How do you enforce that: “The consumer will always know what sequences it has already processed (and committed) so that it will never process the same message twice.

    The consumer always maintains on its side the last committed sequence number. That’s a long variable containing the sequence number of the last message it is sure it has processed. This can be accomplished by persisting that sequence number to disk through a memory-mapped file. The flow is:

    • Read message 345
    • Process message 345
    • Commit to disk sequence 345

    Because we are using a memory-mapped file to persist the last committed sequence number we get for free two things:

    • Performance – this disk I/O operation is very fast
    • Reliability – this information is never lost (i.e. not persisted) unless the operating system crashes

    Caveat #2

    Once a message is distributed by the broker to a particular consumer it can never be reclaimed and re-assigned to a different consumer. The main reason for that is that the broker knows nothing about the messages that the consumer has already successfully consumed and processed. This is only known by the consumer itself and it is only informed to the broker when the consumer connects/reconnects. As explained above, this is accomplished through the last committed sequence number that the consumer maintains on its side. This dictates that a consumer can never go away (i.e. disappear) for good, without returning to finish the consumption of its messages. This can be accomplished by some type of failover for the consumer, in other words, if the consumer fails then another instance of itself (or itself) will have to come back (reconnect) and take over, with the correct last committed sequence number.

    Consumer Sessions

    Consumers identify themselves to the broker through consumer sessions, allowing the broker to know when a brand-new consumer is connecting for the first time and when an old consumer is coming back through a reconnect (or failover). The first time a consumer connects to the broker, it will contain a session name that the broker has never seen before. The broker will understand that the consumer is a brand-new consumer and will grant the requested session to that consumer. If the consumer connects with a session name that the broker has already seen, the broker will know that the consumer is coming back (after a crash or disconnect) and will re-assign the session to that consumer. All consumers must consume all messages from their session, until they receive an End-of-Session message. As said above, that guarantees that all messages will be processed and no message will ever be lost (i.e. left behind).

    Diagram


    Screen Shot 2021-11-18 at 7.30.50 AM

    Performance

    Our CoralSequencer messaging middleware can push more than 1 million messages per second, with individual TCP point-to-point latencies of single digit microsecond, depending on your network infra-structure.

    Conclusion

    By guaranteeing exactly-once delivery, a messaging middleware can effectively distribute a load of messages across multiple consumers, anywhere and without requiring them to deal with duplicates or lost messages. And most importantly, two different consumers will never receive the same message to consume and process.