Blog

Hadoop Hive Data Types with Examples

  • (4.0)
  •   |   855 Ratings

Data Types

HIVE supports many of the Primitive Data types you find in relational databases.

Primitive Data types

  • Hive supports several sizes of integer and floating point types, a Boolean type and character strings of arbitress length.
  • Hive v 0.8.0 added types of time stamps and binary fields.

Primitive types supported by Hive are:

Type

Size

Example

Tiny int

1 byte signed integer

20

Small Int

2 byte signed integer

20

INT

4 byte signed integer

20

BIGINT

8 byte signed integer

20

BOOLEAN

Boolean true or false

TRUE

FLOAT

Single Precision floating point

3.14159

DOUBLE

Double precision floating point

3.14159

STRING

Sequence of characters single or Double quotes can be used 

‘Now is the hme’,”for all good my”

Collection Data Types:-

Hive supports columns that are struct, maps and arrays

Type

Description

Example

STRUCT

Analogous to a (struct or an “object” fields can be accessed) using the ”dot” nation. For example, if a column name of type STRUCT {first string, last string} then the first name field can be referenced using name first

Struct(‘Joint’,’Doe’)

MAP

A collection of key-value resples, where the fields are accessed using array notation (e.g[“key”]). For example, if a column name is of type MAP with key  value point ‘first’  ’John’ and ‘last’, ’Doe’, then the last name can be referenced using name last

map(‘first’, ’join’, ’last’, ’Doe’)

ARRAY

Ordered sequences of the same type that are inferable using zero-based integers.

Array(‘John’,’Doe’)

Text File Encoding:

Text files are delimited with commas or tabs which are called CSV’s or TSR’s respectively.

Hive can use those formats but there is a draw back to both formats.

  • We have to be careful about commas or tabs embedded in text and not intended as field or column delimiters.
  • For this reason, Hive uses various control characters by default, which are less likely to appear in value strings.

Hive uses the term field when overriding the default delimiter.

Delimiter

Description

For text files, each line is a record so the line feed character separates records.

^ A 

Separates all fields (Columns) and written using the octal code ?01 cohen explicitly specified in CREATE TABLE Statements.

^ B 

Separate the elements in an ARRAY or STRUCT, or the key-value paid in a MAP written using the octal code?02 when explicitly specified in CREATE TABLE Statements.

^ C 

Separate the key from the corresponding value

Overriding  default delimiters:-

Table deduction with all the format defaults explicitly specified.

CREATE TABLE Employees(name STRING, Salary FLOAT, Subordinates ARRAY< STRING >Deduction MAP< STRING, FLOAT>, Address STRUCT )

ROW FORMAT DELIMITED FIELDS TERNNATED

  • By’?01’ COLLECTION ITEMS TERMINATED
  • By’?02’ MAP KEYS TERMINATED By’?03’

LINES TERMINATED By’n’ STORED AS TEXT FILE; ROW FORMAT DELIMITED.

  • Sequence of key words must appear before any of the other clauses, with the exception of the STORED AS –Clause.
  • ?01 is the octal code for ^ A character to separate fields.
  • Similarly, /002 is the octal code for ^ B and /003 B the octal code for ^ C
  • Hive uses the ^B character to separate collection items and ^C character to separate map keys from values.
  • The clause LINES TERMINATED By() and STORED AS-do not requite the ROW FORMAT DELIMITED Keywords.

EX : Text File for the Table created as above:-

  • John Doc ^A100000.0 ^A May smith ^B Todd. Jones ^A Fedoral Taxes^C.2
  • ^B State Taxes ^C.05 ^B Insurance ^C.1^A1 Michigan Are. ^B chicage ^BIL ^B 606

List of Other Big Data Courses:

 

Popular Courses in 2018

Get Updates on Tech posts, Interview & Certification questions and training schedules