Tuesday, September 9, 2014

Apache Kafka. Set up. Writing to. Reading from. Monitoring. Part 1

Phew! That was a mouth full... but, basically what I am going to try to do here is:
  1. Show you how to set up your own Apache Kafka cluster.
  2. Write to Apache Kafka and read from it using Kafka supplied shell scripts and then by using Java client.
  3. Finally, we quickly review KafkaOffsetMonitor - an app to monitor your Kafka consumers and their position (offset) in the queue.

First thing first... What is Apache Kafka? Straight from the source: "Apache Kafka is publish-subscribe messaging rethought as a distributed commit log." There are number of documents and guides out there that talk about the architecture and design, so I am not going to restate it here. What we want is some hands on experience with Kafka to test it out... in my case, it is much easier for me to learn things if I can play with them as I read about them.

In order to set up our cluster, let's use Vagrant to help us with the set up and configuration of VMs. This way, we can always reuse it one a different system and most importantly, we will be able to delete VM and start from scratch if we were to mess up without trying to break our heads if we removed something properly from the system or not.

If you never used Vagrant before, consider walking through Getting Started tutorial to truly appreciate it!

Anyway let's download VirtualBox and Vagrant and install them. Please follow installation guides...

Now, create a directory which would hold your VM instance and properly name it so that you could tell it apart from other VMs in the future.
In this case, we are going to use very basic debian since we won't need fancy GUI, etc.
 
mkdir debianKafkaClusterNode1
cd debianKafkaClusterNode1
vagrant init debianKafkaClusterNode1 http://puppet-vagrant-boxes.puppetlabs.com/debian-70rc1-x64-vbox4210.box 
 
After this we are going to have Vagrantfile that can be used to configure basic VM configurations like memory, network configuration, etc. That being said, let's bump up our memory to 2GB. Find and upcomment following lines and set memory to 2048 or however much you want to give to the box. In this tutorial, we are going to set up cluster of 3 machines, so we are going to need 2x3=6GB of memory only for VM, don't forget about your own host system ;)

  config.vm.provider "virtualbox" do |vb|
  #   # Don't boot with headless mode
  #   vb.gui = true
  #
  #   # Use VBoxManage to customize the VM. For example to change memory:
     vb.customize ["modifyvm", :id, "--memory", "2048"]
  end

Since we want for this VM to be able to communicate with other boxes and it might be a good idea to SSH to it from our host system, let's find config.vm.network in the Vagrantfile and uncomment it out. Here we are going to assign it IP: 192.168.33.10

  # Create a private network, which allows host-only access to the machine
  # using a specific IP.
  config.vm.network "private_network", ip: "192.168.33.10"

Now we are ready to start the box!
vagrant up
 
The very first time it might take sometime since Vagrant will attempt to download
debian-70rc1-x64-vbox4210.box from http://puppet-vagrant-boxes.puppetlabs.com/.

Now log into the box
vagrant ssh

And install java and text editor
sudo apt-get update
sudo apt-get install openjdk-8-jdk
sudo apt-get install vim
 
Now we are going to install Apache Kafka by downloading its source code and building it.
sudo su -
wget https://archive.apache.org/dist/kafka/kafka-0.8.0-beta1-src.tgz
mkdir /opt/kafka
tar -zxvf kafka-0.8.0-beta1-src.tgz
cd kafka-0.8.0-beta1-src
./sbt update
./sbt package
./sbt assembly-package-dependency
cd ../
mv kafka-0.8.0-beta1-src /opt/kafka

Install Zookeeper
wget http://apache.claz.org/zookeeper/zookeeper-3.4.6/zookeeper-3.4.6.tar.gz
mkdir /opt/zookeeper
tar -zxvf zookeeper-3.4.6.tar.gz --directory /opt/zookeeper
cp /opt/zookeeper/zookeeper-3.4.6/conf/zoo_sample.cfg /opt/zookeeper/zookeeper-3.4.6/conf/zoo.cfg

Configure Zookeeper by creating a directory for zookeeper data
mkdir -p /var/zookeeper/data

and specifying this directory in /opt/zookeeper/zookeeper-3.4.6/conf/zoo.cfg file
by editing dataDir property like so
dataDir=/var/zookeeper/data

In the same file add if not already there value for first zookeeper server. Specify IP of our newly created machine.
server.1=192.168.33.10:2888:3888

Specify myid so that the server would be able to identify itself in the zookeeper cluster
echo "1" > /var/zookeeper/data/myid

Configure Kafka
Edit server.properties files in /opt/kafka/kafka-0.8.0-beta1-src/config
Set following values by finding them in the above mentioned file and making sure they are uncommented as well
broker.id=1
host.name=192.168.33.10
zookeeper.connect=192.168.33.10:2181

It would be much easier to add Zookeeper and Kafka location to the PATH so we don't have to refer to it by the entire path, plus if you later decide to move them you won't have to change hardcoded path. To add them to your environmental variables create or edit if already exists .bash_profile file like so
vim ~/.bash_profile

add following entries to it
export ZK_HOME=/opt/zookeeper/zookeeper-3.4.6/
export KAFKA_HOME=/opt/kafka/kafka-0.8.0-beta1-src/
export PATH=$ZK_HOME/bin:$KAFKA_HOME/bin:$PATH

Make sure to close and open new terminal window for changes to take effect!

Start Zookeeper
sudo $ZK_HOME/bin/zkServer.sh start

Start Kafka
sudo $KAFKA_HOME/bin/kafka-server-start.sh $KAFKA_HOME/config/server.properties &

Test Kafka by listing available topics. At first, it should not have any:
$KAFKA_HOME/bin/kafka-list-topic.sh --zookeeper 192.168.33.10:2181

Test Kafka by creating a new topic
$KAFKA_HOME/bin/kafka-create-topic.sh --zookeeper 192.168.33.10:2181 --replica 1 --partition 1 --topic test-topic

Re-run command to list available topics. You should see test-topic in the list

Test Kafka by producing massages tot hat topic from the console. Use ctrl+c when finished.
$KAFKA_HOME/bin/kafka-console-producer.sh --broker-list 192.168.33.10:9092 --topic test-topic
This
is
Kafka
Test

Test Kafka consumer to verify that messages are there for that topic.
$KAFKA_HOME/bin/kafka-console-consumer.sh --zookeeper 192.168.33.10:2181 --topic test-topic --from-beginning

You should see
This
is
Kafka
Test

If you followed above test steps and everything worked as expected, it is time to package this box so that you can re-use it for our future boxes within cluster. On your host machine execute
VBoxManage list vms

You should see something like this:
"debian-cluster-node-1_default_1409266303013_22617" {3d996de2-94e1-4d72-be8f-29f36150ac84}

Use this name to package it up into a box.
vagrant package --base debian-cluster-node-1_default_1409266303013_22617 --output debian-kafka-cluster.box

Once Vagrant is done you should see debian-kafka-cluster.box in the same directory. Go ahead and shutdown the VM.
vagrant halt

In summary, we created a VM, installed all required software on it (java, zookeeper, kafka) and configured it to work within a cluster. Next we would have to duplicate our box into several machines so that we can have a Kafka cluster.

No comments:

Post a Comment