Important Notice: CloudKarafka is shutting down. Read all about it in our End of Life Announcement

Zookeeper Atomic Broadcast Protocol (ZAB) and implementation of Zookeeper.

Written by Elin Vinka

As the last of three blog posts regarding Zookeeper and how it acts as a centralized service in Apache Kafka we will in this blog post focus on Zookeeper Atomic Broadcast Protocol.

What is ZAB (Zookeeper Atomic Broadcast Protocol)?

The ZAB protocol ensures that the Zookeeper replication is done in order and is also responsible for the election of leader nodes and the restoration of any failed nodes. In a Zookeeper ecosystem, the leader node is the heart of everything; every cluster has one leader node and the rest of the nodes are followers. All incoming client requests and state changes are received at first by the leader with responsibility to replicate it across all its followers (and itself). All incoming read requests are also load balanced by the leader within itself and its followers.

What duties does ZAB have?

Casual order maintenance:

If a transaction X is received after a transaction Y gets committed by the sender of X, then Y must get ordered before X. If the client sends Z after X, then Z must get ordered before X.

Dependable delivery:

If a transaction "A" has been committed by one server, it's important that it gets committed by all the servers.

Total Order maintenance:

A change in a client state is known as a transaction in the Zookeeper world. If a transaction X gets committed before a transaction Y by some server, then the same order will have to be replicated by all the servers. The replication of transaction takes place as long as the number of minimum required nodes (majority) are up. In a situation where a node fails and recovers, that specific node should be capable of replicating all the transaction that got committed during its downtime.

How is ZAB implemented?

Zookeeper has been built around a two-phase commit protocol that allows it to replicate all the transactions while keeping in mind all the design principles mentioned above. The leader node generates transactions and assigns sequel numbers to them upon receiving a client state change request. It then sends those transactions to all its follower nodes and waits for their acknowledgments.

When receiving ACKs from a quorum, commit calls are sent to the quorum for all the transactions. A follower checks the sequel number of the issued transaction and only commits it if it doesn’t have any outstanding transactions in the queue.

A node can only be a leader node if it has the quorum number of nodes as followers. In case a leader node fails, a recovery protocol gets invoked which involves the following stages:

  • Election of a new leader
  • Discovery across the cluster.
  • Synchronization of transactions.
  • Broadcast of relevant data.

Next phase is important as this is when all the followers get in touch with the new possible leader in effort to get information regarding recently accepted transactions. This is necessary since this is how the updated sequence of transactions across the cluster is produced.

The next step is synchronization, which basically completes the recovery efforts of the protocol. By making use of the updated data that are presented by the leader, all the replicas in the cluster are synchronized.

The leader sends out transactions present in its history to all the followers and if the followers find that their history is behind that of the leader, they start acknowledging the incoming transactions. When receiving ACKs from a quorum, commit messages are dispatched; this is the point where the possible/prospective leader actually becomes the new leader of the cluster.