This guide is divided into three parts, and this first one, will focus on the Producers.
An acknowledgment (ACK) is a signal passed between communicating processes to signify acknowledgment, i.e., receipt of the message sent. The ack-value is a producer configuration parameter in Apache Kafka, and can be set to following values:
- acks=0 The producer never waits for an ack from the server. No guarantee can be made that the server has received the message, and the retries configuration will not take effect since the server will never know that the message was lost. This means that you got a low cost while sending your message, but you might pay your price in message durability.
- acks=1 The producer gets an ack after leading replica has received the data. The leader will write the record to its local log but will respond without awaiting a full acknowledgment from all followers. The message will be lost only if the leader fails immediately after acknowledging the record, but before the followers have replicated it. This means that you got a higher cost while sending your message, and not maximum, but high, durability.
- acks=all The producer gets an ack after all-in sync replicas have received the data. The leader will wait for the full set of in-sync replicas to acknowledge the record. This means that it takes a longer time to send a message, but gives strongest message durability.
Read more about ack-values in Kafka here.
How should you set the ack value for the producer in Apache Kafka?
For the highest throughput set the value to 0. For no data loss, set the ack-value to all (or -1). For high, but not maximum durability and for high but not maximum throughput - set the ack-value to 1. Ack-value 1 can be seen as an intermediate between both of the above.
Batch messages in Apache Kafka
Messages can be sent together in a specific way as groups, called a batch. The batch can then be sent when specified criteria for the batch is met; when the number of messages for the batch has reached a certain number or after a given amount of time. Sending batches of messages is recommended since it will increase the throughput.
Always keep a good balance between building up batches and the sending rate. A small batch might give you a low throughput and lots of overhead. However, a small batch is still better than not using batches at all. A too large batch might take a long time to collect, keeping consumers idling. This depends on the use case; if you have a real-time application make sure you don't have large batches.
Compression of Large messages
The producer can compress records, and the consumer can decompress them. We recommend that you compress large messages to reduce the disk footprint, and also the footprint on the wire. It’s not meant to send large files through Kafka. Put large files on shared storage instead of sending it through Kafka. Read more about compression in Apache Kafka here.
Use Apache Kafka clients that are up to date
The Protocol for Apache Kafka changes a lot, and it’s therefore hard for clients to keep up with all of these changes. If you need a client that stays up to date, we recommend the following clients:
All demos in the CloudKarafka documentation pages are wrappers around librdkafka.