Hadoop HBase Schema and Versioning

Schema Creation:-

H BASE schema can be created or updated with H Base Shell by using H Base Admin in the java APZ

Tables must be disabled when making column family modifications.


Configuration config= H Base configuration. Crete();

H Base Admin admin = new H Base Admin (conf);

String Table=”my table”;

Admin. disableTable(table);

HColumn Descriptor cf1 = new H column Descriptor(‘Hadoop’);

Admin. Add column(table, cf1); adding new column family

H Column Descriptor cf2 = new H column Descriptor(‘Hadoop’);

Admin. Modify column(table, cf2);

                       Modifying existing column family.

Admin. Enable table (table);

When changes are made to either Table of column families (Ex: region size, block size), these changes take effect the next time whenever there is a major compaction and then stored files get re- written.

H Base Versioning:-

Maximum number of versions:-

The Maximum number of row versions to store is configured per column family via H column Descriptor.

The default value for max versions is 3

      As discussed previously, H Base does not over write row values, but rather stores different values per row by time.

Excess versions are removed during major compactions.

Minimum number of versions:-

 Like max no. of row versions, minimum no. of row versions are configured per column family via H Column Descriptor.

The default value for minimum versions is 0, which means the feather is disabled.

The minimum no. of row versions parameter is used together with the time-to-live parameter and can be combined with the no. of row versions parameters to allow configurations such as keep the last T minutes worth of data, almost N versions, but keep atleast M versions around where M is the value for minimum number of row version, M

Map Reduce Integration with H Base:-

 Map Reduce mean for only parallel processing

 Table input format is the class.

When you want to work with map Reduce programming with H Base, H Base will either act as source/sink which means that map reduce will either take the data from H Base Regions or map reduce will produce the results on top of H Base Regions after processing the data.


To run a map reduce job that needs classes from libraries not shipped with hadoop or the map Reduce framework

 We need to make those libraries available before the job is executed.

We have two choices for static preparation of all task nodes supplying everything needed along with the job.

For static preparation, it is useful to permanently install its JAR file locally on the task tracker machines and those machines will run the map reduce tasks.

 This is done by the doing the following :

1. Copy the JAR files into a common location on all nodes.
2. Add the JAR files with full location into the hadoop –env -sh configuration file, into the Hadoop- class path variable.

HBase as Source:-

When HBase act as source for map reduce programming, it will take all the split information from the below class org. apache. Hadoop. H base. Map reduce. table input format.

Hbase as sink:-

In this case, the input data can be taken either from HDFS or H Base, but the final output after processing the data will get stored on top of H Base regions for doing the same. It will use the below class.

Org. apache. hadoop. reduce. Table out put format

 For mapper business logic, these is a class called extends table mapper.

For reducer business logic, class extends tuple reducer.

To load the bulk data or if H Base shell is not working , we give below commands :

H Base Admin(class)
H Base configuration conf=new H Base configuration(Admin,””);
Config. create H table(‘Gopal’);
Column Descriptor.


Get Updates on Tech posts, Interview & Certification questions and training schedules