Important Notice: CloudKarafka is shutting down. Read all about it in our End of Life Announcement

Apache Kafka Concepts

There are a some concepts that is good to be familiar with before you get started with your first CloudKarafka instance.

Cluster
In Apache Kafka, a cluster consists of two or more servers, referred to as brokers.
Broker
A broker is an Apache Kafka server; a queue manager that can handle a large number of reads and writes per second from many clients. The message data is replicated and stored persistently on the brokers.
Messages
Sometimes also called Records or Events. Information that is sent from the producer to a consumer through Kafka. Messages are byte arrays that can store any object format - strings or JSON as the most common ones. A message contains information and is specified with a key, value, (optional) timestamp, and headers.
Topics
A topic can be seen as a category and is the logical grouping of partitions that makes it easier to manage and configure them. A Topic can contain anywhere from one to an infinite number of partitions.
Producers and Consumers
Producers publish data to partitions under a certain topic, a consumer consumes messages from one partition.
Partitions
Under a Topic, you will find one or more Partitions. The partitions hold the server structure, and Apache Kafka writes to and manages them. Each partition contains a log for all message entries. During a replication process, it's the partition that replicates in the cluster.
Records
see messages
Events
see messages
Distributed System
Apache Kafka is a distributed system in which topics are partitioned and replicated across multiple nodes. A distributed system is a software model where different parts are located on distinct computers which communicate by passing messages between one another. This type of system enables different components to work together and coordinate their actions.
Replication
The possibility for partitions to be replicated across a configurable number of servers for fault tolerance.
Replicas
A replica of a partition is a "backup" of a partition, used to prevent data loss without reading or writing data. Apache Kafka replication can be configured from 1 to the maximum number of nodes in a cluster.
ISRS (In Sync Replicas)
In-sync replicas refer to the number of replicas that are synchronized with the partition leader. Kafka strives to keep the configured number of replicas in sync, although ISRS indicates the actual number. In the case of too few in-sync replicas to sustain a correct in-sync state of the cluster, Kafka can block both consumers and producers.
High watermark
The high watermark refers to the highest committed offset on all the ISR servers and is the offset that we can be sure is available on different servers.
Offset
The messages in the partitions are each assigned a sequential ID number called Offset, which identifies each message within the partition.
Leader
The broker responsible for all reads and writes for a given partition.
Follower
A broker that follows the leader by replicating its data is called a follower. If any leader fails, any of the followers can become the new leader.
Retention Period
A message remains in the topic for a configurable period of time, until a configurable size is reached, or until the specified retention for the topic is exceeded.
Zookeeper
The Zookeeper software is central to keeping an Apache Kafka cluster in sync. It is responsible for leader election, controlling and maintaining access control lists, and managing cluster membership. It also acts as a shared configuration service within the system, ensuring all brokers are correctly configured.
Consumer Groups
A consumer group is a set of consumers who collaborate to consume data from topics. Since it is guaranteed that one message will only be delivered to one consumer in the group, using a Consumer Group helps to spread the processing load.
Commit Log
The commit log is an append-only data structure; it is the basic concept of a partition, where records can be appended but not changed or deleted.
Log Compaction
A topic with log compaction enabled will always retain the record for each unique key, allowing you to turn your topic into a Key-Value storage.