Is CoralSequencer really deterministic? What about the clock?

In this article we explore the deterministic nature of CoralSequencer and how its total ordered message stream is a natural enabler of high-availability clusters.

One of the most important features of CoralSequencer is the guarantee that all nodes will consume the exact same set of messages in the exact same order, always. What follows from this premise is that all nodes can become deterministic finite-state machines (FSM), where the same messages input will always transition the node into the exact same state. In other words, when a node starts from scratch (i.e. initial blank state) and consumes the same input messages, its final state will always be the same. That allows a backup node to late join a CoralSequencer session and operate as a mirror (i.e. exact copy) of the primary node, in what is called a high-availability cluster.

When we say that CoralSequencer is deterministic, what we are saying is that CoralSequencer supports nodes with deterministic state based on the event-stream messages. That feature allows for the creation of cluster of nodes that can be used for high-availability and failover. But what about the non-deterministic nature of clocks?

To explore this issue, let’s code a simple matching engine that will work as a FSM for high-availability and failover:

package com.coralblocks.coralsequencer.node;

import static com.coralblocks.corallog.Log.*;

import com.coralblocks.coralbits.util.DateTimeUtils;
import com.coralblocks.coralreactor.nio.NioReactor;
import com.coralblocks.coralreactor.util.Configuration;
import com.coralblocks.coralsequencer.message.Message;
import com.coralblocks.coralsequencer.mq.Node;

public class MatchingEngineNode extends Node {
	
	private long evenMsg;
	private long previousMatchTime;
	
	private final StringBuilder sb = new StringBuilder(64);

	public MatchingEngineNode(NioReactor nio, String name, Configuration config) {
		super(nio, name, config);
	}
	
	@Override
	protected void handleOpened() {
		// initial/blank state...
		evenMsg = -1;
		previousMatchTime = -1;
	}
	
	@Override
	protected void handleRewinded() { // caught up with live event-stream...
		
		sb.setLength(0);
		if (previousMatchTime != -1) {
			DateTimeUtils.formatDateTimeInMillis(previousMatchTime, sb);
		} else {
			sb.append("-1");
		}
		
		Sysout.log(name, 
				  "State after catching up with live event-stream:",
				  "evenMsg=", evenMsg, "previousMatchTime=", sb);
	}
	
	@Override
	protected void handleMessage(boolean isMine, Message msg) {
		
		if (isMine) return; // I'm not going to match my own messages...
		
		long seq = msg.getSequence();
		
		if (seq % 2 != 0) return; // I only match even sequence numbers...
		
		if (evenMsg > 0) {
			
			long nowInMillis = System.currentTimeMillis(); // non-deterministic clock
			
			sb.setLength(0);
			sb.append("MATCHED ");
			sb.append(evenMsg).append(" => ").append(seq);
			sb.append(" @ ");
			DateTimeUtils.formatDateTimeInMillis(nowInMillis, sb);
			sb.append(" previous=");
			if (previousMatchTime != -1) {
				DateTimeUtils.formatDateTimeInMillis(previousMatchTime, sb);
			} else {
				sb.append("-1");
			}
			
			Sysout.log(name, "isRewinding=", isRewinding(), sb);
			
			sendCommand(sb);
			
			previousMatchTime = nowInMillis;
			
			evenMsg = -1;
			
		} else {
			
			evenMsg = seq;
			
		}
	}
}

The logic above matches messages that have an even sequence number, skipping its own messages. Below the output of this MatchingEngineNode when it sees some live messages in the event-stream:

D7

D2

Now when we go ahead and start a second node instance to form a cluster, we notice a state inconsistency that breaks determinism. And the cluster!

D3

The code is using System.currentTimeMillis() to compute the timestamp, which clearly returns a different value when it is called in the future by a node joining the cluster. The solution is to not use this non-deterministic clock and instead resort to CoralSequencer’s deterministic event-stream clock. Below the single-line change to the MatchingEngineNode code to fix everything:

// Intead of this:
long nowInMillis = System.currentTimeMillis(); // non-deterministic clock

// We should use this:
long nowInMillis = currentSequencerTime() / 1000000L; // deterministic clock

The currentSequencerTime() method returns the time as determined by the sequencer and placed in the event-stream, therefore it will always return the same value depending on the message the node is currently consuming from the event-stream. In other words, it will always return the same value for the same position in the event-stream, no matter when the node calls this method, now or two hours later. It returns the epoch time in nanoseconds, so we divide by 1,000,000 to get the epoch time in milliseconds, which is what we need.

With this change in the code, we now run the same experiment again, starting everything from scratch, with a new CoralSequencer session. The first node instance:

D4

The second node instance:

D5

So there you go! No matter when you start the node instance, its clock will always produce the same deterministic time from the event-stream, producing the exact same state for the node joining the cluster. State is always deterministic and consistent, and running a high-availability cluster with zero downtime failover becomes straightforward as you can see in the video below. CoralSequencer actually goes a step further and, in addition to the deterministic clock, gives you deterministic timers, but that’s a topic for another article.

(Maximize the video below for a better viewing experience)