- H BASE schema can be created or updated with H Base Shell by using H Base Admin in the java APZ
- Tables must be disabled when making column family modifications.
Configuration config= H Base configuration. Crete();
H Base Admin admin = new H Base Admin (conf);
String Table=”my table”;
Subscribe to our youtube channel to get new updates..!
HColumn Descriptor cf1 = new H column Descriptor(‘Hadoop’);
Admin. Add column(table, cf1); = adding new column family
H Column Descriptor cf2 = new H column Descriptor(‘Hadoop’);
Admin. Modify column(table, cf2);= Modifying existing column family.
Admin. Enable table (table);
When changes are made to either Table of column families (Ex: region size, block size), these changes take effect the next time whenever there is major compaction and then stored files get re-written.
H Base Versioning:-
Maximum number of versions:-
- The Maximum number of row versions to store is configured per column family via H column Descriptor.
- The default value for max versions is 3
- As discussed previously, H Base does not overwrite row values but rather stores different values per row by time.
- Excess versions are removed during major compactions.
Minimum number of versions:-
- Like max no. of row versions, minimum no. of row versions are configured per column family via H Column Descriptor.
- The default value for minimum versions is 0, which means the feather is disabled.
- The minimum no. of row versions parameter is used together with the time-to-live parameter and can be combined with the no. of row versions parameters to allow configurations such as keep the last T minutes worth of data, almost N versions, but keep atleast M versions around where M is the value for minimum number of row version, M
Map Reduce Integration with H Base:-
- Map Reduce mean for only parallel processing
- Table input format is the class.
- When you want to work with map Reduce programming with H Base, H Base will either act as source/sink which means that map-reduce will either take the data from H Base Regions or map-reduce will produce the results on top of H Base Regions after processing the data.
- To run a map-reduce job that needs classes from libraries not shipped with Hadoop or the Map-Reduce framework
- We need to make those libraries available before the job is executed.
- We have two choices for static preparation of all task nodes supplying everything needed along with the job.
- For static preparation, it is useful to permanently install its JAR file locally on the task tracker machines and those machines will run the map-reduce tasks.
- This is done by doing the following :
1. Copy the JAR files into a common location on all nodes.
2. Add the JAR files with full location into the Hadoop –env -sh configuration file, into the Hadoop- classpath variable.
HBase as Source:-
- When HBase acts as a source for Map Reduce programming, it will take all the split information from the below class org. apache. Hadoop. H base. Map-reduce. table input format.
Hbase as sink:-
- In this case, the input data can be taken either from HDFS or H Base, but the final output after processing the data will get stored on top of H Base regions for doing the same. It will use the below class.
- Org. apache. Hadoop. hbase.map-reduce. Table output format
- For mapper business logic, there is a class called extends table mapper.
- For reducer business logic, class extends tuple reducer.
- To load the bulk data or if H Base shell is not working , we give the below commands :
H Base Admin(class) H Base configuration conf=new H Base configuration(Admin,””); Config. create H table(‘Gopal’); Column Descriptor.