Blog

Hadoop Hive Data Types with Examples

Data Types

 
HIVE supports many of the Primitive Data types you find in relational databases.
 

Primitive Data types

Hive supports several sizes of integer and floating point types, a Boolean type and character strings of arbitress length.

Hive v 0.8.0 added types of time stamps and binary fields.
Primitive types supported by Hive are
 
Type Size Example
Tiny int 1 byte signed integer 20
Small Int 2 byte signed integer 20
INT 4 byte signed integer 20
BIGINT 8 byte signed integer 20
BOOLEAN Boolean true or false TRUE
FLOAT Single Precision floating point 3.14159
DOUBLE Double precision floating point 3.14159
STRING
Sequence of characters single or Double quotes can be used 
‘Now is the hme’,”for all good my”
     
 

Collection Data Types:-

Hive supports columns that are struct, maps and arrays
 
 
Type Description Example
STRUCT Analogous to a (struct or an “object” fields can be accessed) using the ”dot” nation. For example, if a column name of type STRUCT {first string, last string} then the first name field can be referenced using name first Struct(‘Joint’,’Doe’)
     
MAP A collection of key-value resples, where the fields are accessed using array notation (e.g[“key”]). For example, if a column name is of type MAP with key  value point ‘first’  ’John’ and ‘last’, ’Doe’, then the last name can be referenced using name last map(‘first’, ’join’, ’last’, ’Doe’)
     
ARRAY Ordered sequences of the same type that are inferable using zero-based integers. Array(‘John’,’Doe’)
 

Text File Encoding:

    Text files are delimited with commas or tabs which are called CSV’s or TSR’s respectively.
Hive can use those formats but there is a draw back to both formats.
   
       We have to be careful about commas or tabs embedded in text and not intended as field or column delimiters.
 
      For this reason, Hive uses various control characters by default, which are less likely to appear in value strings.
 
Hive uses the term field when overriding the default delimiter.
 
Delimiter Description
  For text files, each line is a record so the line feed character separates records.
^ A  Separates all fields (Columns) and written using the octal code ?01 cohen explicitly specified in CREATE TABLE Statements.
^ B  Separate the elements in an ARRAY or STRUCT, or the key-value paid in a MAP written using the octal code?02 when explicitly specified in CREATE TABLE Statements.
^ C  Separate the key from the corresponding value
 
Overriding  default delimiters:-
 
Table deduction with all the format defaults explicitly specified.
 
CREATE TABLE Employees(name STRING, Salary FLOAT,

Subordinates ARRAY< STRING >Deduction MAP< STRING, FLOAT>,

Address STRUCT )

ROW FORMAT DELIMITED FIELDS TERNNATED

By’?01’ COLLECTION ITEMS TERMINATED

By’?02’ MAP KEYS TERMINATED By’?03’

LINES TERMINATED By’n’ STORED AS TEXT FILE;
 
 ROW FORMAT DELIMITED.Sequence of key words must appear before any of the other clauses, with the exception of the STORED AS –Clause.
 
 ?01 is the octal code for ^ A character to separate fields.
 
Similarly, /002 is the octal code for ^ B and /003 B the octal code for ^ C
 
Hive uses the ^B character to separate collection items and ^C character to separate map keys from values.
 
The clause LINES TERMINATED By() and STORED AS-do not requite the ROW FORMAT DELIMITED Keywords.
 
EX : Text File for the Table created as above:-
 
John Doc ^A100000.0 ^A May smith ^B Todd. Jones ^A Fedoral Taxes^C.2

^B State Taxes ^C.05 ^B Insurance ^C.1^A1 Michigan Are. ^B chicage ^BIL ^B 606

RELATED COURSES

Get Updates on Tech posts, Interview & Certification questions and training schedules