CloudKarafka introduces Kafka Connect

Written by Magnus Landerblom

To simplify the integration of Kafka with other systems, CloudKarafka announces support for Kafka Connect.

Kafka Connect lets you integrate other systems with Kafka. You can add a data source which allows you to consume data from that source and store it in Kafka, or the other way around, have all data in a topic sent to another system for processing or storage. There are many possibilities that Kafka Connect gives, and it's easy to get started since there are already a lot of connectors available.

How does Kafka Connect work?

  1. Enable Kafka Connect. Start by enabling Kafka Connect for your cluster, which easily can be done from our UI.
  2. Add a Connector. Add a connector to Kafka Connect, for example, a JDBC source connector.
  3. Configure the connector. Configure the connector to tell where the database is and how to connect to it. You have to configure to which topic the data should be sent. You also have the ability to set how many workers you need, a larger number of workers allows for faster consumption of the source data.
  4. Add transformations. If the data format from the source connector is wrong, you can add transformations. This allows you to use a part of the data to store in the topic, or if you want to use a specific data property as key for the message.
  5. Add a Sink Connector. Now that you have the data in Kafka you can add a sink connector, for example adding an S3 sink connector, which lets you write each message in the topic as files in an Amazon S3 bucket.

Kafka Connect can be used either as a standalone process, in use for testing and temporary jobs, but it can also be used as a distributed, and scalable, service which is more tolerant. Kafka Connect on CloudKarafka runs distributed over all your nodes in the cluster.

One really powerful feature by using Kafka to integrate these systems is that you can add multiple source connectors, merge these topics and write the merged data to Amazon S3 or just another table in your database.

The main benefits of using Kafka Connect

  1. Kafka Connect will simplify the ability of adding new systems to your data pipelines while keeping the stream both scalable and secure.
  2. Kafka Connect makes it easier for developers to write high quality, reliable, and high-performance connector plugins and increases reliability as it makes it possible for the framework to make guarantees that other frameworks have difficulties to achieve.
  3. Usage of Kafka is made more flexible and scalable by using Kafka Connect, this by the ability to run with streaming and batch-oriented systems. Users can choose whether it should run on a single node or be scaled up to an organization-wide service.
  4. The possibility to extend connectors to tailor to your needs, make the usage of Kafka Connect lower time to production. By this, you can both reuse and expand your current connectors.
  5. To move data into and out of Kafka, Kafka Connect simplifies the usage of existing connector implementations for common data sources and sinks.
  6. Since the connectors for different applications or data systems are not a part of Apache Kafka main code base, the usage of Kafka Connect, makes it possible to use pre-built connectors without needing to write any code.

Start using Kafka Connect on CloudKarafka!

To set up kafka connect, open up the console for the cluster that you want to run Kafka Connect on. Click on Connectors and then Kafka Connect in the menu.

Kafka Connect is running in distributed mode on CloudKarafka, and when running distributed mode you configure all connectors from the Kafka Connect REST API. So all you need to do is to click Start and Kafka Connect will be running for your cluster.

Note: If you are planning to run an S3 connector, you must add the access tokens in the form on the page. If Kafka Connect is already running when you submit the form it will be restarted in order to pick up the variables. The tokens that you enter will be stored as environment variables on each of the server in the cluster only.

Once Kafka Connect is up and running you need access to the Rest API to configure your connectors. We secure the Rest API with Basic Auth, and you need to generate a new token to use for this API which can be done on the Authentication page, found in the menu.

Once you have generated the authentication, you will see some example request for the Rest API on the Kafka Connect page to help you get started.

Plugins

Kafka Connect comes with the following plugins installed:

  • Elasticsearch (sink)
  • Hdfs (sink)
  • Jdbc (sink)
  • Jdbc (source)
  • S3 (sink)
  • File stream (sink)
  • File stream (source)

Of course, you can add other plugins of your choice.
Unfortunately, we don’t have a UI for you to do this yourself at this moment,
but you can email us the plugin and we will add it for you.

And as always, feel free to send us any questions or feedback you might have at support@cloudkarafka.com

CloudKarafka - Industry Leading Apache Kafka as a Service