What is a Kafka 'Topic'?
Kafka uses topics to organize messages, similar to how a database uses tables for the same purpose. Topics are identified by a unique name, and messages are sent to and read from specific topics. Whenever we want to store data in Kafka, we need to specify to which 'topic' such data should be stored.
A 'topic' is basically the destination that will hold all the data. If you are familiar with a message queue, think of the topic like the queue name. If you are new to all of this, just think of the topic as a box where the messages are stored.
Each topic must be given a name, which is basically the only agreement that Kafka producers and consumers need in order to properly cooperate.
Basically, the producer will pick a name and store the messages to that topic name. The consumer should be given that name so it knows where to expect the data to be read from.
What is a 'Partition'?
Each topic can have one or more partitions - fractions of a topic. When a topic is created, the operator (or the cluster) should decide how many partitions to split the topic into.
Using the previous example of a topic beings a box we can see partitions as smaller boxes inside the topic box.
Partitions are helpful when it comes to scaling because different partitions of the same topic can be served by different Kafka brokers.
Based on the chosen 'key', messages meant for a topic will end up in one of its partitions. Incoming messages are then appended to each partition in order, and will be read in the same order.
Wherever a topic is created with more than one partition, it is important to carefully choose the partition key so that all partitions will end up holding more or less the same amount of messages. Otherwise, Kafka will spread messages in a Round Robin fashion across partitions.
To choose the correct number of partitions or the partition key often requires a deep understanding of how partitioning works together with the business logic. That particular knowledge set is above the scope of this article.
Topics can be created automatically or manually.
If the broker's setting for 'auto.create.topics.enable' is 'true' (default) then a new topic will be created whenever a consumer or a producer tries to read or write a topic that is not present on the cluster. In this case, the new topic will be created using the cluster settings (replication factor and number of partitions).
Alternatively, topics are created explicitly from the producer or from the DBA. In this case, it is necessary to specify the replication factor and the number of partitions along with the topic name.
This is a utility that is well used by Kafka administrators and is provided by default with the Kafka package. Download the full utility package here: https://kafka.apache.org/downloads
Inside the folder '/bin' you will find a rich set of utilities. Remember that you need to pass the location of your kafka server using ‘--bootstrap-server’.
In the following examples we assume you are running the utility from the same server kafka broker is running on.
Create a topic: replication factor and number of partitions are required.
Query the cluster for existing topics:
./kafka-topics.sh --bootstrap-server localhost:9092 --list
Same as above, but with greater detail:
./kafka-topics.sh --bootstrap-server localhost:9092 --describe
Topic: my_first_topic PartitionCount: 3 ReplicationFactor: 3 Topic: my_first_topic Partition: 0 Leader: 1 Replicas: 1,2,0 Isr: 1,2,0 Topic: my_first_topic Partition: 1 Leader: 2 Replicas: 2,0,1 Isr: 2,0,1 Topic: my_first_topic Partition: 2 Leader: 0 Replicas: 0,1,2 Isr: 0,1,2
Delete the topic:
./kafka-topics.sh --bootstrap-server localhost:9092 --delete --topic my_first_topic
Are you ready to explore more with your own Kafka topics and partitions? CloudKarafka can help you with a free instance. Learn how to create one at cloudkarafka.