Important Notice: CloudKarafka is shutting down. Read all about it in our End of Life Announcement

What is Zookeeper and why is it needed for Apache Kafka?

Written by Elin Vinka

Are you using Apache Kafka to build message streaming services? Then you might have run into the expression Zookeeper. To us at CloudKarafka, as a Apache Kafka hosting service, it’s important that our users understand what Zookeeper is and how it integrates with Kafka, since some of you have been asking about it - if it’s really needed and why it’s there.

In this blog series, you will learn more about Zookeeper, what it is and how it’s important to Apache Kafka. This first article explains how Zookeeper works and why it is necessary for the use of Apache Kafka. Part two focuses on your cluster and how Zookeeper's function co-operates with your cluster, and in part three we explain to you about the Zookeeper Atomic Broadcast protocol (ZAB) and how to implement Zookeeper.

What is Zookeeper?

Zookeeper is a top-level software developed by Apache that acts as a centralized service and is used to maintain naming and configuration data and to provide flexible and robust synchronization within distributed systems. Zookeeper keeps track of status of the Kafka cluster nodes and it also keeps track of Kafka topics, partitions etc.

Zookeeper it self is allowing multiple clients to perform simultaneous reads and writes and acts as a shared configuration service within the system. The Zookeeper atomic broadcast (ZAB) protocol i s the brains of the whole system, making it possible for Zookeeper to act as an atomic broadcast system and issue orderly updates.

* More about ZAB in part 3.

How does Zookeeper work?

The data within Zookeeper is divided across multiple collection of nodes and this is how it achieves its high availability and consistency. In case a node fails, Zookeeper can perform instant failover migration; e.g. if a leader node fails, a new one is selected in real-time by polling within an ensemble. A client connecting to the server can query a different node if the first one fails to respond.

Image describes one Kafka cluster and one Zookeeper in three different servers, and shows how the Zookeeper's collaborate with each other.

Why is Zookeeper necessary for Apache Kafka?

Controller election

The controller is one of the most important broking entities in a Kafka ecosystem, and it also has the responsibility to maintain the leader-follower relationship across all the partitions. If a node by some reason is shutting down, it’s the controller’s responsibility to tell all the replicas to act as partition leaders in order to fulfill the duties of the partition leaders on the node that is about to fail. So, whenever a node shuts down, a new controller can be elected and it can also be made sure that at any given time, there is only one controller and all the follower nodes have agreed on that.

Configuration Of Topics

The configuration regarding all the topics including the list of existing topics, the number of partitions for each topic, the location of all the replicas, list of configuration overrides for all topics and which node is the preferred leader, etc.

Access control lists

Access control lists or ACLs for all the topics are also maintained within Zookeeper.

Membership of the cluster

Zookeeper also maintains a list of all the brokers that are functioning at any given moment and are a part of the cluster.

Please note that you can’t run Kafka services without first installing Zookeeper. However, Zookeeper is already installed and configured for your CloudKarafka cluster.

CloudKarafka and Zookeeper

Since Zookeeper is a part of CloudKarafka, most of our users never have to acknowledge its presence. Zookeeper is installed and configured by default, depending on the number of nodes in your cluster, and most customers will never actively integrate with Zookeeper. Thus, you still have the option to reach Zookeeper from Cloudkarafka dedicated plans, if you wish to.

How to connect to Zookeeper

You can connect to the Zookeeper CLI using the local IP addresses on plans with VPC peering. You need to connect from a VPC that is peered with the CloudKarafka VPC. You can connect using zkCli.sh -server PRIVATE_IP:2181, where PRIVATE_IP is the IP of the Zookeeper you want to connect to.

If you have further questions, don’t hesitate to contact us!

And as always, feel free to send us any questions or feedback you might have at support@cloudkarafka.com