Performance optimization for Apache Kafka - Consumers

Written by Lovisa Johansson

Kafka Consumers reads data from the brokers and can be seen as the executor in the Apache Kafka three-stage rocket. By refining your consumer setup, you can avoid common errors and ensure your configuration meets your expectations. This guide is divided into three parts, and this is part three. The previous and following blog post focuses on the Producers and Brokers in the Apache Kafka symbiosis.

Apache Kafka Consumer

Consumers can read log messages from the broker, starting from a specific offset. Consumers are allowed to read from any offset point they choose. This allows consumers to join the cluster at any point in time.

A consumer can join a group, called a consumer group. A consumer group includes the set of consumer processes that are subscribing to a specific topic. Consumers in the group then divide the topic partitions fairly amongst themselves by establishing that each partition is only consumed by a single consumer from the group, I.e., each consumer in the group is assigned a set of partitions to consume from. Kafka guarantees that a message is only read by a single consumer in the group.

Make sure all consumers in a consumer group have a good connection

Partitions are redistributed between consumers every time a consumer connects or drop out of the consumer group. This means that consumers in the group are not able to consume messages during this time. If one consumer in a group has a bad connection, the whole group is affected and will be unavailable during every reconnect. A distribution of partitions takes around 2-3 seconds or longer. To make sure your setup is running smoothly, we strongly recommend you to secure the connection for your consumers.

Number of consumers

Ideally, the number of partitions should be equal to the number of consumers.

  • Number of consumer > number of partitions If the number of consumers is greater, some consumers will be idling, i.e., you will be wasting client resources.
  • Number of partitions < number of consumers Some consumers will read from multiple partitions if the number of partitions is greater than the number of consumers.

As mentioned before, the availability will be affected if one consumer has a bad connection. The more consumers you have, the larger risk there is, that one might drop and halt all other consumers.

Get started with Apache Kafka

We offer fully managed Apache Kafka clusters with epic performance & superior support

Get a managed Apache Kafka server for FREE

CloudKarafka - Industry Leading Apache Kafka as a Service