Mindmajix

An Overview of Hadoop Hive

Hadoop Hive

Capture 15 Hive is a component of hadoop which is built on top of HDFS and is a ware house kind of system in hadoop

Capture 15 Hive will be used for data summarization for Adhoc queering and query language processing

Capture 15 Hive was first used in facebook (2007) under ASF i.e. Apache software foundation

Capture 15 Apache Hive supports analysis of large datasets that are stored in Hadoop – compatible file systems such as Amazon s3 file system.

Capture 15 Hive provides an SQL – like language called Hive QL language while also maintaining full support for mapreduce.

Capture 15 Hive does not mandate read or written data in the Hive format and there is no such thing

Capture 15 Hive equally works on .Thrift, control delimited and also on your specialized data formats.

Capture 15 Hive is not designed for OLTP workloads and does not offer real –time queries or row level updates.

Capture 15 It is best used for batch jobs over large sets of append – only data.

Hive Architecture:-

Screenshot_1822

Capture 15 Driver manager life cycle of Hive QL query moves through Hive and also manages session handle and session statistics.

Capture 15 Compiler-compiles Hive QL into a directed acyclic graph of map/reduce tasks.

Execution engines:-The component executes the tasks in proper dependency order and also interacts with hadoop.

Capture 15 Hive server provides thrift interface and JDBC/ODBC for integrating other applications.

Capture 15 Client components are CLI, web interface, jdbc/odbc interface.

Capture 15 Extensibility interface include serde, user defined Function and also user Defined Aggregate function.

Meta store:-

     Meta store is the Hive internal database which will store all the table definitions, column level information and partition ID’S.

Capture 15 By default Hive uses the derby database as its meta store.

Capture 15 We can also configure Mysql, Thrift server as the meta stores

Capture 15 The Meta store is divided into two pieces are the service and the backing store for the data.

Capture 15 By default the meta store service runs in the same JVM  as the Hive service and contains an embedded Derby database instance backed by the local disk This is called Embedded Meta store configuration.

Capture 15 Using an embedded meta store is a simple way to get stored with Hive and however only one embedded Derby database can access the database files on disk at any one time which means you can only have one Hive session open at a time that shares the same meta store.

Capture 15 The solution to supporting multiple sessions is to use standalone database and this configuration is referred to as a local meta store, since the meta store service still runs in the same process as the Hive service, but connections to a database running in a separate process, either on the same machine or on any remote machine.

Capture 15 Mysql is a popular choice for stand alone meta store

Capture 15 In this case, JDBC Driver JAR file for Mysql must be on Hive class which is simply archived.

Installing Hive:-

Capture 15 Installation of Hive is straight forward and Java 1.6 is a prerequisite.

Capture 15 If you are installing on windows, you will need cygwin too.

Capture 15You also need to have the same version of Hadoop installed locally either in standalone or pseudo-distributed mode or where your cluster is running while getting started with Hive.

Steps:-

  1. Download the Hive Release at http://Hive.apche.org/ html. i.e. far ball file.
  2. Unpack the tar ball in a suitable place in your Hadoop Installation environment. i.e $ far – xzvf Hive- 0.8.1 tar.gz
  3. Setting the environment variable HIVE-HOME to point the installation directory:
$ cd Hive -0.8.1
$ export HIVE –HOME={{pwd}}

     4.Add $ HIVE –HOME/bin to your PATH

$ export PATH=$ HIVE –HOME/bin: $ PATH

Data base Setup:-

Capture 15 Install Mysql server with developed and tested versions 5.1.46 and 5.1.48.

Capture 15 Once you have Mysql up and running, use the Mysql Command line tool to add the Hive user and Hive meta stored data base.

Capture 15 Now, we need to pick a password for your Hive user and replace db password in the following commands with it.

Capture 15 Log into Mysql

Cmd:> Mysql
Mysql>create DATABASE meta store;
Mysql>use meta store;
Mysql> SOURCE usr/lib/Hive/scripts/ meta store/upgrade/ Mysql/ Hive-schema-07.0. Mysql. Sql;
Mysql> CREATE USER ‘Hive user’@’%’ IDETIFIED By’ password’;
Mysql> GRANT SELECT ‘INSERT,UPDATE, DELETE ON meta store To’ Hive user’@’%’;
Mysql> REVOKE ALTER,CREATE ON meta store* FROM Hive user’@’%’;

Capture 15 To start Mysql services

Cmd:>cd/etc/init.d
>./ Mysql start
To stop:>./ Mysql stop.

Capture 15 To start Hive services

Cmd: usr/bin/Hive-service Hive server

Configuration of Hive:-

Capture 15 Hive is Configured using an XML Configuration file like Hadoop and the file is called ‘Hive-site.xml’

    Hive-site.xml is located in Hive conf directory

Capture 15 The same directory contains Hive-default.xml which documents the properties that Hive exposes and their default values.

Configure the Hive-site.xml as below

<property>
<name>javax.dbo. option. connection URL</name >
<value>jdbc: der by; data base name -/var/lib/Hive/meta store/meta store-db; create=true</value>
<description>JDBC Connect string for a JDBC meta store
</description>
</property>

Comparison with Traditional Databases:-

Capture 15 Hive looks very much like traditional database code with SQL access.

Capture 15 However, because Hive is based on Hadoop and MapReduce operation, there are several key differences

Capture 15 In a traditional data base, a table’s schema is enforced at data load time.

Capture 15 If the data being loaded doesn’t confirm to the schema, then it  is rejected.

Capture 15 Hive, on the other hand, doesn’t verify the data when it is loaded, but rather when a query is issued.

Capture 15 Updates, transactions and indexes are main stays of traditional data bases.

Capture 15 Yet, until recently, these features have not been considered as a part of Hive’s feature. This is because Hive was built to operate over HDFS data using Map Reduce where full – table scans are the norm and a table update is archived by transforming the data into a new table.

Hive CLI Options

$HIVE-HOME/bin/Hive is a shell utility which can be used to run Hive queries in either interactive or batch mode.

Hive command line Option

Capture 15 To get help for Hive options, run the command as “Hive-H” or ”Hive– help”

Capture 15 Command line options as in Hive 0.9.0

  1. Hive-d or—define<key=value> Capture 15 variable substitution to apply to Hive. Commands

Ex:– -d A=B or—define A=B

  1. hive –e<quoted-query-string> Capture 15 SQL from command line

      3. hive-f<filename> Capture 15 SQL from files.

      4. hive-h<hostname>connection to Hive serer on remote host

      5.—hive conf<property=value> Capture 15 use value for given property.

     6.—hive var<key=value> Capture 15 variable substitution to apply to Hive commands.

Ex:—hive var A=B.

     7. hive-i<filename> Capture 15 initialization SQL file

     8.hive-p<port> Capture 15 connecting to Hive server on port number

  1. hive-s—or–silent Capture 15 silent mode in interactive shell
  2. hive-v or—ver bose Capture 15 verbox mode(echo executed SQL to the console)

Capture 15 As of Hive 0.10.0, there is one addition command line option Hive—data box<db name> Capture 15 specify the database to use

Examples:-

Capture 15 Example for running a query from the command line

$HIVE-HOME/bin/Hive-e ’select a.col from tab1

Capture 15 Example of setting Hive configuration variables

$HIVE-HOME/bin/Hive-e ’select a.col from tab1 a’-Hive conf
Hive. exec. Scrarch dir=/home/my/Hive-Hive conf mapred. reduce. tasks=32

Capture 15 Example of dumping data out from a query into a file using slient mode.

$HIVE-HOME/bin/Hive-s-e ’select a.col from tab1 a’>a.txt

Capture 15 Example of running a script non- interactively

$HIVE-HOME/bin/Hive-f /home/my/Hive-script.sql

Capture 15 Example of running an initialization script before entering interactive mode.

$HIVE-HOME/bin/Hive-i /home/my/Hive-init.sql

Capture 15 The Hiver File hive

           The CCI when invoked without the – I option will attempt to load $HIVE-HOME/bin/Hive rc and HOME/.Hive rc as initialization files.

Hive Batch mode commands:-

Capture 15 When $HIVE-HOME/bin/Hive is run with the –e or-option, it executes SQL Commands in batch mode.

.hive-e’<query-string>’execute the query string

.hive-f <file path> execute one or more SQL queries from a file.

Examples:-

Capture 15 When $HIVE-HOME/bin/Hive is run without either –e or- f option, it enters interactive shell mode

i.e Capture 15 #hive

Capture 15 hive>

Capture 15 We have to use ’;’ to terminate commands

Capture 15 Comments is scripts that can be specified using the ‘–’ prefix 

Commands in interactive shell:-

Sl.no Command Description
1.      Quit or exit Capture 15

Use quit or exit to lease the interactive shell.

2.      Reset Capture 15

Resets the configuration to the default values.

3.      Set<key>=<value> Capture 15

Set the value of a particular configuration variable(key)

Note: If you misspell the variable name, the CLI will not show an error.

4.      Set Capture 15

Prints a list of configuration variables that are over ridden by the user or Hive

5.      Set-r Capture 15

Prints all Hadoop and Hive configuration variables.

6.      Add file[S]<file path><file path>*

Add JAR [S]<file path><file path>*

Add ARCHIVE[S]<file path><file path>*

Adds one or more files, jars or archives to the list of resources in the distributed cache.

7.      List File[S]

List JAR[S]

List ARCHIVE[S]

Lists the resources that are already added to the distributed cache.

8.      List File[s] ><file path>*

Add JAR [S]<file path>*

Add ARCHIVE[S]<file path>*

Checks whether the given resources are already added to the distributed cache or not.

9.      Delete FILE[S]<file path>*

Delete JAR[S]<file path>*

Delete ARCHIVE[S]<file path>*

Removes the resource(s) from the distributed cache.

10.                         !<command>

Executes the shell command from the Hive shell

11.                         dfs<command>

Executes a dfs command from the Hive shell

12.                         <query string>

Executes a Hive query and prints results to the standard output.

13.                         Source File <file path>

Executes a script file inside the CLI

For Example:-

hive>set map red. reduce. tasks=32;
hive >set;
hive >select a.* from tab1;
hive >! Ls

0 Responses on An Overview of Hadoop Hive"

Leave a Message

Your email address will not be published. Required fields are marked *

Copy Rights Reserved © Mindmajix.com All rights reserved. Disclaimer.
Course Adviser

Fill your details, course adviser will reach you.