When a consumer wants to read data from Kafka, it will sequentially read all messages in a topic. A marker called a 'consumer offset' is recorded to keep track of what has already been read. This marker prevents the consumer from parsing over the same message many times, in case of a restart of the consumer or other circumstances. Acting like a bookmark, the consumer can start reading from where it left off the last time.
The consumer offsets are recorded in Kafka itself by the consumers in an internal topic called
__consumer_offsets. The way the consumer offsets are recorded is an incremental number starting from 0.
Let's push some messages to a test topic:
Kafka-verifiable-producer.sh --broker-list Kafka-broker(s):9092 --topic generated_by_kafka_verifiable-producer-0 --max-messages 10
The above command will produce 10 messages in a topic called
We can now read those messages:
Kafka-verifiable-consumer.sh --broker-list kafka-broker(s):9092 --topic generated_by_kafka_verifiable-producer-0 --group-id kafka-verifiable-consumer --group-instance-id 1
After the messages have been read, it will be possible to query Kafka to know more about the consumer group, including the offset:
Kafka-consumer-groups.sh --describe --bootstrap-server kafka-broker(s):9092 --group kafka-verifiable-consumer generated_by_kafka_verifiable-producer-0
GROUP TOPIC PARTITION CURRENT-OFFSET LOG-END-OFFSET LAG CONSUMER-ID HOST CLIENT-ID kafka-verifiable-consumer generated_by_kafka_verifiable-producer-0 0 10 10 0 - - -
If you see a message like:
"Consumer group 'Kafka-verifiable-consumer' has no active members", this means the consumer is not actively listening.
We can see that all 10 messages have been consumed. And there are no more messages to read, which is what the 'LAG' column tells us. The LAG is the difference between the last message in the partition and the one read by the consumer.
Let's push 10 more messages and query the consumer state again:
Kafka-verifiable-producer.sh --broker-list kafka-broker(s):9092 --topic generated_by_kafka_verifiable-producer-0 --max-messages 10
GROUP TOPIC PARTITION CURRENT-OFFSET LOG-END-OFFSET LAG CONSUMER-ID HOST CLIENT-ID kafka-verifiable-consumer generated_by_kafka_verifiable-producer-0 0 10 20 10 - - -
The LAG is now 10 because the last consumer offset is 10, but the topic has 20 messages.
Now you know not only what a consumer offset is, but also how to inspect it!