Pulsar was developed by Yahoo in 2013, and in 2016, it was open-sourced for the first time. Since then, Pulsar has gained a lot of popularity, and it has become the default choice of many organizations. Some of its key features include:
When we discuss the performance of Apache Pulsar we mostly talk about its low latency and high throughput. However, the architecture and configuration of Pulsar are responsible for its high performance. In this blog, we will understand its Architecture thoroughly.
Table of Content: Apache Pulsar Architecture |
In Apache Pulsar, at the highest instance, multiple Pulsar clusters are available, which distribute data and tasks among themselves equally.
Let's learn about it further:
Furthermore, the multiple Pulsar clusters are responsible for task coordination. Many functions such as Geo-replication, message replication, and many more involve multiple clusters.
If you want to enrich your career and become a professional in Apache Spark, then enroll in "Apache Spark Training" - This course will help you to achieve excellence in this domain. |
The big reason behind its popularity is due to the Stateless Brokers. These brokers are competent enough to start immediately to process higher demand. The broker is called "Stateless" because it doesn't store any messaging data. Aforesaid, messages are stored in Apache BookKeeper. We'll talk about BookKeeper further. Pulsar assigns each topic partition to each broker. The broker to whom the Topic partition is assigned is called as Owner broker of that particular topic partition. Producers and consumers in Pulsar connect to the required owner broker of a topic partition to consume and produce messages.
If a broker fails to do so, Pulsar moves the topic partition that was owned by it to the remaining brokers, which are available in the cluster automatically. One thing that needs to be cleared; the ownership of a broker is moved to another broker when the topic is moved to a different broker. And no data is replicated during this period.
Related Article: What is Apache Pulsar
The metadata store collects all the data of clusters. It collects topics such as schema, broker load data, and so on. For things such as Metadata storage, cluster configuration, and coordination, Pulsar uses ZooKeeper. Each cluster has its own ZooKeeper to collect cluster-specific configuration and coordination such as metadata, ownership, BookKeeper ledger data, and much more.
Pulsar uses a system called Apache BookKeeper to store and manage the messages. BookKeeper is a distributed system that provides a number of significant benefits:
A Ledger is an append-only data configuration with a sole writer which is assigned to many bookies storages nodes. They are replicated to numerous bookies. A pulsar broker is responsible for creating a ledger, appending entries to the ledger, and closing the ledger. Moreover, after the closing of the ledger—due to the writer process crash or explicitly. Then it can be opened in reading mode only. Later all the entries, that is, the whole ledger can be deleted if they are not needed anymore.
Segment-centric Storage is the best function of Pulsar. Due to this, many storage problems are resolved now. We know that Pulsar has a layered architecture and segment-centric storage is the two key designs of Pulsar. Check out the benefits of them:
These are some of the key benefits of segment-centric storage.
Preparing for Apache Spark Interview? Here’s Top Apache Spark Interview Questions and Answers |
All the configurations of pulsar instances such as cluster, tenants, namespaces, partitioned, and so on are stored in the configuration store. Moreover, a Pulsar instance can have multiple local clusters and single local clusters or multiple cross-region clusters. Also, the configuration store can share these configurations across all the clusters under the Pulsar instance. The configuration store can be deployed on a separate ZooKeeper cluster or an existing ZooKeeper cluster.
The core benefits of using Pulsar lie in its architecture; Pulsar provides guaranteed message delivery. If a message has reached the broker successfully, then it will be delivered to its intended target.
The guaranteed messages require that are non-acknowledged messages are stored in a solid manner until they can be delivered and acknowledged by the consumer. This mode of messaging is what we call—Persistent Storage.
By far now, you must have understood the whole architecture of Apache Pulsar. However, you must keep your needs in mind before working with Pulsar. Moreover, you should try all the good options that are available because practical experience is better than theoretical knowledge. According to us, Pulsar is the best option right now in the market; it has a lot of benefits which can ease your work a lot.
Name | Dates | |
---|---|---|
Apache Ambari Training | Sep 10 to Sep 25 | View Details |
Apache Ambari Training | Sep 14 to Sep 29 | View Details |
Apache Ambari Training | Sep 17 to Oct 02 | View Details |
Apache Ambari Training | Sep 21 to Oct 06 | View Details |
Madhuri is a Senior Content Creator at MindMajix. She has written about a range of different topics on various technologies, which include, Splunk, Tensorflow, Selenium, and CEH. She spends most of her time researching on technology, and startups. Connect with her via LinkedIn and Twitter .