Kafka is a stream processing software developed by LinkedIn previously and now functioning under Apache foundation. Kafka is written in Scala, it is a publish-subscribe based messaging system.
In this tutorial, you will gain knowledge on concepts like Kafka Introduction, messaging system, terminologies, workflow, cluster setup, use-cases and real-time applications.
Apache Kafka is a messaging system which allows interaction between producers and consumers through message-based topics. Kafka is becoming popular because of the features like easy access, immediate recovery from node failures, fault-tolerant, etc. These features make Apache Kafka suitable for communication, integrating the components of big data systems. It is integrated with Apache spark and storm for analyzing the streamed data. This tutorial teaches you the Kafa basics, Advantages, Disadvantages, workflow, Installation and basic operations. In the end, we will conclude with real-time applications of Kafka.
The main job of the messaging system is to transfer the data from one application to another. As a result, applications can concentrate on data, not on how to share it. Messages are distributed using a message queuing system. The consumer consumes messages present in the queue by following specific messaging patterns, they are:
In this pattern. Messages remain in the queue. Anyone can consume the messages present in the queue, but only one consumer can consume a specific message. When the consumption is over, that particular message goes away from the queue.
This pattern is popularly known as pub-sub, and it is the most widely used messaging pattern. In this pattern, Messages remain in a topic. In this system, publishers are those who produce the messages and subscribers are those who consume the messages. To explore this system, we can take Dishtv as an example. Dishtv publishes various channels, and anybody may subscribe to their group of channels and have them at any time.
The above diagram shows the Kafka cluster architecture. The elements of the Kafka cluster architecture can be explained in the following way:
Kafka allows both pub-sub and queue-based messaging system. In both systems, producers only job is to send the messages and consumers job is to choose any messaging system depending upon the requirement. Let us see the steps required for a consumer to select a messaging system.
In queue messaging system rather than a single consumer, group of consumers who are having the same ‘group ID’ will subscribe to a topic. Consumers who are subscribed to a topic with same “group ID” are considered as one group, and the messages of that topic are shared among them. The real workflow of this system is as follows:
Before installing Apache Kafka, we should check whether Java is installed in our system. To check the java installation, we should follow the below step :
Step-1: Checking Java
$ java -version
If java is installed in your system, java installed version is displayed. If java is not installed, you have to follow the below steps for installing java
Step 1.a: Downloading JDK
JDK can be downloaded by clicking on the below URL:
By clicking the above URL, you can download JDK based on your system configuration.
Step 1.b: Extracting the JDK files.
After downloading JDK in your system, open downloads folder and check whether JDK is stored or not. After checking, extract the files from the JDK archive file(tar or rar) by using the below commands:
$cd /my/dir/downloads/path $tar -zxf jdk-8u241-win-x32.gz.
Step 1.c: Move to my directory.
Java files are made accessible to every user by extracting java files to my directory.
$su Password: (type the password of the root user) $mkdir /my/jdk $mv jdk-13.0.2_windows-x64_bin.tar.gz /my/jdk/
Step 1.d: Setting environment Variables and Path
For setting environment variables and path, you should follow the below commands.
export JAVA_HOME = /user/jdk/jdk-13.0.2 export PATH=$PATH: $JAVA_HOME/bin
Step 2: Installing the Zookeeper Framework
Step 2.a: Downloading Zookeeper
By visiting the below link, you can download the zookeeper framework.
Zookeeper’s latest version is 3.5.6
Step 2.b: Tar file Extraction
We can extract files from tar files by following the below commands.
$cd my/ $tar -zxf zookeeper-3.5.6.tar.gz $cd zoo3.5.6 $mkdir mydir1
Step 2.c: Creation of configuration file
By executing command vi “conf/zoo.cfg” we can open the configuration file which is named as “conf/zoo.cfg”
$ vi conf/zoo.cfg tickTime=2000 mydir1 Dir = /path/for/zookeeper/mydir1 clientPort=2181 initLimit=10 syncLimit=5
After this configuration file is saved, zookeeper can be started.
Step 2.d: Starting Zookeeper server
$ bin/zkserver.sh start
When you execute the above command, the System displays the following response:
$ JMX is enabled default $ Using config: /user1/../zookeeper-3.5.6/bin/ ../conf/zoo.cfg $ Starting zookeeper … STARTED
Step 2.e: Connecting to the Zookeeper server.
For connecting to the zookeeper server, you should execute the following command.
After executing the command, your Zookeeper server is started.
Step 2.f: Stopping the server of Zookeeper
After completing your work, zookeeper server is stopped by executing the below command:
$ bin/zkserver.sh stop
We have successfully finished the installation of java and zookeeper. Now, let us see the installation of Apache Kafka.
Step 3: Installing Apache Kafka
To install apache Kafka into our system, the following steps are executed
Step 3.a: Download Apache Kafka
For installing the Apache Kafka into your system, you have to download it from the following URL
Step 3.b: Extracting Tar file
By executing the below commands, tar file is extracted.
$ cd my/ $ tar -zxf kafka_184.108.40.206.0.0 tar.gz $cd kafka_220.127.116.11.0.0
You have successfully downloaded the Apache Kafka into your system.
Step 3.c: Starting Apache Kafka server
The Apache Kafka server is started by executing the below command.
$ bin/kafka-server-start.sh config/server.properties
When we execute the above command, your Apache Kafka will be started. You will see the parameters of the server like timeout, protocol version, roll hours, etc.
Step 3.d: Stopping the Apache Kafka server.
The Apache Kafka server can be stopped by executing the following command.
$ bin/kafka-server-stop.sh config/server.properties
2. Tracking Website Activity
4. Log Aggregation
5. Stream Processing
6. Event Sourcing
7. Commit Log
Apache Kafka is widely used in Twitter Platform. Due to Apache Kafka users can send and receive tweets. In twitter, logged users can view and post tweets, but users who are not logged can only see the tweets. For its stream processing, twitter uses Storm-Kafka.
For data streaming and operational measures, Apache Kafka is used in LinkedIn. Many products of LinkedIn like LinkedIn Newsfeed, LinkedIn Today, take the help of Apache Kafka. The durability factor of Apache Kafka helps LinkedIn to use it.
Netflix is another online platform which uses Apache Kafka for carrying out its services. It uses Apache Kafka only for event processing and video streaming.
Apache Kafka is also used by Mozilla, Oracle and many other enterprises.
Apache Kafka, introduced by Apache, plays a vital role in managing real-world data feeds. It offers fault tolerance if any machine fails. Kafka is speedy, it does two million writes/sec. It provides messaging in two different ways. Its simple terminology makes us understand the procedure for message passing. This tutorial is enough to acquire basic knowledge about Apache Kafka. For more information about Apache Kafka, please attend Apache Kafka Training.
Free Demo for Corporate & Online Trainings.