Introduction to HBase for Hadoop
HBase will come into picture, when Hadoop is stopped.
HBase is an open source, non-relational, distributed database which is built on top to HDFS.
Origins of H Base came from the Google Big Table and Big Tables can take up only structured data.
HBase will come into picture, where exactly Hadoop is left off i.e Hadoop’s primary storage mechanism by the means of HDFS does not suit for Random Reads and Random writes.
In order to provide real-time random reads and random writes, H Base will be used which is a column-oriented database.
In a typical RDBMS kind to databases like SQL, Mysql, Postage SQL, we cannot add or delete the columns at runtime.
In a web log table, we can add or delete the columns at runtime.
At that point of time, row oriented type of database will not fit correctly and end up with fired size not-column combination.
The primary client interface to H Base Is the H Table class in the ora apache. Hadoop- hbase. client package.
It provides the user with all the functionality which is needed to store and retrieve data from H Base as well as to delete obsolete values.
All operations that mutate data are guaranteed to be atomic on a per row basis.
It does not matter how many columns are written for the particular row, but all of them must be covered by this guarantee of atomics.
H Base is a column store data box where the entire accessing will be driven by column names row key intersection.
Zookeeper – Job Tracker, Task Tracker
H Master – Name Node
Region Server – Data node.
Regions – blocks.
Zookeeper:- H Base depends on Zookeeper and by default it manages a Zookeeper instance as the authority on cluster
HMaster:- Master Node:-
Assigns regions to region servers using Zookeeper
Handle load balancing
Not part of data path
Holds meta data and schema
Handle Reads and writes.
Handle region splitting
ZK HM closely depends on Zookeeper and not on HM
It’s a onetime H Master job and all the data get stores in Zookeeper and H Master acts as a temporary Master.
H Base will store the information in the form of a column family names.
Ex:- Hadoop: Map Reduce Column Name
Column Family Hadoop: Pig Column Name
Name Hadoop: Hive Column Name
H Base Clients:-
There are no. of client options for interacting with an H Base Cluster.
H Base, Like Hadoop is written in Java.
The primary client interface to HBase is the Hlabel cluss in the org. apache. hadoop. HBase. Client package.
The create a table, we need to first create an instance of H Base Admin and then ask it to create the table which is named test with a single column family named data.
Operating on a table, we will need an instance of org. apache. Hadoop. hbase. Client.
After creating an H Table, we then create instance of org. apache. Hadoop. H base. client
Next, we create an org. apache. Hadoop h base. Client. Get, and then use an org. apache. Hadoop. Habase. Client. Scan to see over the table against the just created table and print out the differences you find.
H Base classes and utilities in the org. apache. Hadoop. Habase. Map reduce package facilitate using H Base as a source and/or sink in map reduce Jobs.
The table input format class makes splits on region boundaries so maps are handed. A single region to work on.
The table output format will write the result of reduce into H Base.
Avro, REST and Thrift:-
HBase ships with Avro, REST and Thrift Interfaces.
These are useful when the interacting application is written in a language other than Java
In all cases, a Java server hosts an instance of the H Base client brokering application Avro. REST and Thrift requests in and out of the H Base cluster.
REST is to put up a star gate instance which is the name for the H Base REST service and start using the following command:
%h base- daemon-sh start rest.
This will start an instance by default on port 8080. And catch any emissions by the server in log files under the H Base logs directory.
To stop the REST Server:
%h base- daemon-sh stop rest.
Thrift: Start a Thrift service by putting up a server to field Thrift clients by running the following:
%h base- daemon-sh start Thrift.
This will start the server instance by default on port 9090 and catch any emissions by the server in log files under the H Base logs directory.
The HBase thrift IDL can be found at
Src/main/resources/org/apache/hadoop/h base/thrift/h base. Thrift.
In the HBase source code.
To stop the thrift server, type:
%h base- daemon-sh stop Thrift.
To check the services for H Base.
# cd usr/lib/hadoop/conf
Cmd: /conf# ls master
Cmd:/conf# cat master
o/p: local host
Cmd:/conf# ls slaves
Cmd:/conf# cat slaves
o/p: local host
For H Base shell, change directory to h base bin dir
Cmd: # cd/user/lib/hbase/bin
To show the tables.
Here, if you got error means, parent instance services are not running, even though child instance services are running.
If you want to start the parent instance services, first we have to kill the child instance services and then we have to start the parent.
/usr/lib/h base/bin # jps
#kill[peer process id]
#./start-h base. sh
# hbase shell.
Whir land tour of the data model:-
Applications store data into labeled tables.
Tables are made of rows and columns.
Tables cells are the intersection of rows and columns which together coordinates as versioned.
By default, their version is a time stamp auto-assigned by H Base at the time of cell interaction.
Table row keys are byte arrays theoretically. Any thing can serve a row key from strings to binary representations of long or even serialized data structures.
Table rows are sorted by row key, the tables have the primary key and all table accesses are via the table primary key
Row columns are grouped into column families. And all column family members have a common prefix
For example, the columns temperature: air and temperatures: dew- point are both members of the temperature column family writer as station identifier belongs to the station family.
The column family prefix must be composed of printable characters and the column family qualifier can be made available.
Just as HDFS and Map Reduce are built of clients, slaves and a coordinating master.
- Name Node and data node in HDFS and
- Job tracker and task trackers in map reduce.
H Base modeled with an H Base master node orchestrating a cluster of one or more region server slaves.
The H Base master is responsible for assigning regions to registered region servers and for recovering region servers and for recovering region server failures.
The region servers carry zero or more regions and field client read/write requests.
H Base persists data via the Hadoop file system API and since these are multiple implementations of the file system interface one for the local file system and one for the KRS file system, Amazon’s S3 and HDFS.
The local file system is fine for experimenting with your initial H Base install.