Home / Hadoop

Hadoop HBase Schema and Versioning

Rating: 4.0Blog-star
Views: 3682
by Ravindra Savaram
Last modified: January 29th 2021

Schema Creation:-

  • H BASE schema can be created or updated with H Base Shell by using H Base Admin in the java APZ
  • Tables must be disabled when making column family modifications.


Configuration config= H Base configuration. Crete();

H Base Admin admin = new H Base Admin (conf);

String Table=”my table”;

Admin. disableTable(table);

HColumn Descriptor cf1 = new H column Descriptor(‘Hadoop’);

Admin. Add column(table, cf1); = adding new column family

H Column Descriptor cf2 = new H column Descriptor(‘Hadoop’);

Admin. Modify column(table, cf2);= Modifying existing column family.

Admin. Enable table (table);

Learn how to use Hadoop, from beginner basics to advanced techniques, with online video tutorials taught by industry experts. Enroll for Free HADOOP Training Demo!

When changes are made to either Table of column families (Ex: region size, block size), these changes take effect the next time whenever there is major compaction and then stored files get re-written.

H Base Versioning:-

Maximum number of versions:-

  • The Maximum number of row versions to store is configured per column family via H column Descriptor.
  • The default value for max versions is 3
  • As discussed previously, H Base does not overwrite row values but rather stores different values per row by time.
  • Excess versions are removed during major compactions.

Minimum number of versions:-

  • Like max no. of row versions, minimum no. of row versions are configured per column family via H Column Descriptor.
  • The default value for minimum versions is 0, which means the feather is disabled.
  • The minimum no. of row versions parameter is used together with the time-to-live parameter and can be combined with the no. of row versions parameters to allow configurations such as keep the last T minutes worth of data, almost N versions, but keep atleast M versions around where M is the value for minimum number of row version, M

Map Reduce Integration with H Base:-

Map Reduce Integration with H Base

  • Map Reduce mean for only parallel processing
  • Table input format is the class.
  • When you want to work with map Reduce programming with H Base, H Base will either act as source/sink which means that map-reduce will either take the data from H Base Regions or map-reduce will produce the results on top of H Base Regions after processing the data.

Frequently asked Hadoop Interview Questions


  • To run a map-reduce job that needs classes from libraries not shipped with Hadoop or the Map-Reduce framework
  • We need to make those libraries available before the job is executed.
  • We have two choices for static preparation of all task nodes supplying everything needed along with the job.
  • For static preparation, it is useful to permanently install its JAR file locally on the task tracker machines and those machines will run the map-reduce tasks.
  • This is done by doing the following :

1. Copy the JAR files into a common location on all nodes.
2. Add the JAR files with full location into the Hadoop –env -sh configuration file, into the Hadoop- classpath variable.

HBase as Source:-

  • When HBase acts as a source for Map Reduce programming, it will take all the split information from the below class org. apache. Hadoop. H base. Map-reduce. table input format.

Hbase as sink:-

  • In this case, the input data can be taken either from HDFS or H Base, but the final output after processing the data will get stored on top of H Base regions for doing the same. It will use the below class.
  • Org. apache. Hadoop. hbase.map-reduce. Table output format
  • For mapper business logic, there is a class called extends table mapper.
  • For reducer business logic, class extends tuple reducer.
  • To load the bulk data or if H Base shell is not working , we give the below commands :
H Base Admin(class)
H Base configuration conf=new H Base configuration(Admin,””);
Config. create H table(‘Gopal’);
Column Descriptor.
Explore Hadoop Sample Resumes! Download & Edit, Get Noticed by Top Employers!Download Now!

List of Big Data Courses:

 Hadoop Administration  MapReduce
 Big Data On AWS  Informatica Big Data Integration
 Bigdata Greenplum DBA  Informatica Big Data Edition
 Hadoop Hive  Impala
 Hadoop Testing  Apache Mahout

About Author

NameRavindra Savaram
Author Bio


Ravindra Savaram is a Content Lead at Mindmajix.com. His passion lies in writing articles on the most popular IT platforms including Machine learning, DevOps, Data Science, Artificial Intelligence, RPA, Deep Learning, and so on. You can stay up to date on all these technologies by following him on LinkedIn and Twitter.