Important Notice: CloudKarafka is shutting down. Read all about it in our End of Life Announcement

FAQ: What is the Kafka consumer offset

Written by Fabio Pardi

When a consumer wants to read data from Kafka, it will sequentially read all messages in a topic. A marker called a 'consumer offset' is recorded to keep track of what has already been read. This marker prevents the consumer from parsing over the same message many times, in case of a restart of the consumer or other circumstances. Acting like a bookmark, the consumer can start reading from where it left off the last time.

The consumer offsets are recorded in Kafka itself by the consumers in an internal topic called __consumer_offsets. The way the consumer offsets are recorded is an incremental number starting from 0.

Let's push some messages to a test topic:

Kafka-verifiable-producer.sh --broker-list Kafka-broker(s):9092 --topic generated_by_kafka_verifiable-producer-0 --max-messages 10

The above command will produce 10 messages in a topic called generated_by_kafka_verifiable-producer-0.

We can now read those messages:

Kafka-verifiable-consumer.sh --broker-list kafka-broker(s):9092 --topic generated_by_kafka_verifiable-producer-0 --group-id kafka-verifiable-consumer --group-instance-id 1

After the messages have been read, it will be possible to query Kafka to know more about the consumer group, including the offset:

Kafka-consumer-groups.sh --describe --bootstrap-server kafka-broker(s):9092 --group kafka-verifiable-consumer generated_by_kafka_verifiable-producer-0

GROUP                      TOPIC                                     PARTITION  CURRENT-OFFSET  LOG-END-OFFSET LAG  CONSUMER-ID  HOST  CLIENT-ID
kafka-verifiable-consumer  generated_by_kafka_verifiable-producer-0  0          10          	10             0    -       	 -     -

If you see a message like: "Consumer group 'Kafka-verifiable-consumer' has no active members", this means the consumer is not actively listening.

We can see that all 10 messages have been consumed. And there are no more messages to read, which is what the 'LAG' column tells us. The LAG is the difference between the last message in the partition and the one read by the consumer.

Let's push 10 more messages and query the consumer state again:

Kafka-verifiable-producer.sh --broker-list kafka-broker(s):9092 --topic generated_by_kafka_verifiable-producer-0 --max-messages 10

GROUP                      TOPIC                                     PARTITION  CURRENT-OFFSET  LOG-END-OFFSET LAG  CONSUMER-ID  HOST  CLIENT-ID
kafka-verifiable-consumer  generated_by_kafka_verifiable-producer-0  0          10          	20             10   -       	 -     -

The LAG is now 10 because the last consumer offset is 10, but the topic has 20 messages.

Now you know not only what a consumer offset is, but also how to inspect it!