Apache Kafka On-Prem Setup

Himanshu Negi
5 min readDec 23, 2020

Apache Kafka is a highly flexible streaming platform that supports multiple use cases in modern data architecture. One critical use of Kafka could be for database transaction records. Since Kafka is a distributed system, horizontally-scalable (partitioning), fault-tolerant (replication), low latency and commit log make it a good choice to integrate with your data lake, data warehouse, and BI services. Ideally, in a production environment, you would be required to run a Kafka cluster that is made of more than one Kafka server but for this quick setup guide, I would be running it on Ubuntu in a virtual box. Since this is just a setup guide I would suggest you go through this apache Kafka documentation for more details.

First, you need to copy the download link from here. I will be using tmux so that I can open multiple terminals to make my life a little easy. Move to the directory where you want to install Kafka using wget.

After running the above command you will have a tar file that needs to be unzipped using tar -xvzf.

There will be a new Kafka folder created in the directory and inside there will be a config folder. Then go to inside server.properties and add your IP address in the line advertised.listeners =PLAINTEXT://IPADDR:9092 and also at zookeeper.connect=IPADDR:2181. You can also put localhost.

Now, we would be using CMAK(Cluster Manager for Apache Kafka) which is a user interface to view Kafka cluster and would need java 11 installed on your machine. You can clone the repository from here. Now clone it in your directory.

You will have a new folder CMAK. Go inside the folder and run this command ./sbt clean dist.

After this, there will be a new folder created called target and inside that, a universal folder that will have a cmak zip file that needs to be unzipped.

Go to the new folder then to the conf folder and open the application.conf file and edit line cmak.zhosts=”IPADDR:2181" with your IP address.

Finally, we are done with the configuration, and it's time for the fun part so let's start everything and hope everything is running properly.

First, we need to start the zookeeper for that you need to go inside the Kafka folder and run the command bin/zookeeper-server-start.sh config/zookeeper.properties. Open another terminal to start the Kafka server and then go to the Kafka folder and run the command JMX_PORT=8004 bin/kafka-server-start.sh config/server.properties. Lastly, start the Kafka manager for that go inside the cmak folder which you unzipped and run the command bin/cmak -Dconfig.file=conf/application.conf -Dhttp.port=8080.

Let's go to http://localhost:8080/ and use the interface to create the cluster and topics where we will have all the messages.

Click on cluster then add the cluster. Put the cluster name and give you ipaddr:2181 for cluster zookeeper hosts, Kafka version as 2.6.0 and select Enable JMX Polling and Poll Consumer Information leave everything as default and hit save.

Now to create a topic click on the topic drop-down menu and create. You can add the topic name and give partition and replication factor based on the number of nodes you are running the Kafka cluster.

Congrats you have just created a topic on a Kafka server and now producers can send the messages onto that topic

As discussed earlier that we would be sending data from a relational database so let's get on to that.

I am using Microsoft SQL Server Express 2017 and added the AdventureWorks sample database.

I have written a python code that will read the table DimCustomer and write the top 100 records to the Kafka topic. Currently, there are no messages on the topic. You can see the offset is 0.

And now let's check the messages on the Kafka manager. Here you can the offset is at 100.

In this article, we are just sending 100 records in a batch but if you have a requirement to apply CDC then there are connectors provided by confluent you should definitely check.

--

--