Important Notice: CloudKarafka is shutting down. Read all about it in our End of Life Announcement

A common Apache Kafka retention and segment size mistake

Written by Fabio Pardi

Having trouble with disc space? It can be due to retention and segment settings. It is vital to keep disk usage under control.

Often, when Kafka's use case is a temporary queue, people tend to set up a retention period setting. Something like:

log.cleanup.policy = delete
log.retention.{minutes,hours} = the amount of time to store data

And then forget about the disk space problem that can arise from configuring Kafka that way.

Then people think they had a bad idea due to a sudden spike in data ingestion, making the disks full. There is a better solution.

A more conscious approach

Kafka allows users to configure retention limits on the partitions. The log.retention.bytes setting configures the largest amount of data a partition can hold before it starts to delete data. If both log.retention.{minutes, hours} and log.retention.bytes are set, deleting a segment will occur when either limit is exceeded.

But, there are some things to also take into consideration. The first thing is the concept of a 'segment'.

What is a segment?

Kafka brokers split the partitions into segments. Each segment's maximum size is, by default, 1GB and can change by changing log.segment.bytes on the brokers (or by segment.bytes at the topic level). That means a new segment is created every time a segment reaches log.segment.bytes.

A similar setting, based on time, is called log.roll.{ms,hours}. This setting dictates a segment's maximum age. The default value is 168 hours, which is seven days.

Thus, a Kafka broker will default roll out a segment every seven days if it does not reach 1GB in size. Segments can be one of two kinds, either 'active' or 'non-active'. An active segment is the last created segment. The active segment is the one to which Kafka appends data.

The producer's data always ends up in an active segment.

Active segments

Why is it important to differentiate between active segments and non-active? Because, Kafka can only honor non-active segments depending on the retention rules defined.

It might sound strange, but deletion operations can happen only on non-active segments.

The solution

Things are starting to come together - we have an idea now of what's possible and what rules of thumb to follow.

log.retention.{minutes,hours} > log.roll{minutes,hours}
log.retention.bytes > log.segment.bytes

The above two settings will ensure that segments are controlled in size and/or age. Then, to set a limit to the disk space usage, the upper limit can be defined as:

(total number of partitions) * log.segment.bytes

Summary

It's important to watch your storage usage with Kafka, as leaving it unchecked can be a problem. The first step is adjusting your segment and retention setting, as described in this article. Secondly, don't forget about the disk space alarm you can set up in CloudKarafka.

Let's continue...