Mindmajix

Hadoop Hive Data Types with Examples

Data Types

Hive supports many of the Primitive Data types you find in relational databases.

Primitive Data types

Capture 15 Hive supports several sizes of integer and floating point types, a Boolean type and character strings of arbitress length.

Capture 15 Hive v 0.8.0 added types of time stamps and binary fields.

Capture 15 Primitive types supported by Hive are

Type Size Example
Tiny int 1 byte signed integer 20
Small Int 2 byte signed integer 20
INT 4 byte signed integer 20
BIGINT 8 byte signed integer 20
BOOLEAN Boolean true or false TRUE
FLOAT Single Precision floating point 3.14159
DOUBLE Double precision floating point 3.14159
STRING Sequence of characters single or Double quotes can be used ‘Now is the hme’,”for all good my”

Collection Data Types:-

Hive supports columns that are struct, maps and arrays

Type Description Example
STRUCT

Analogous to a (struct or an “object” fields can be accessed) using the ”dot” nation. For example, if a column name of type STRUCT {first string, last string} then the first name field can be referenced using name first.

Struct(‘Joint’,’Doe’)
MAP

A collection of key-value resples, where the fields are accessed using array notation (e.g[“key”]). For example, if a column name is of type MAP with key Capture 15 value point ‘first’ Capture 15 ’John’ and ‘last’, ’Doe’, then the last name can be referenced using name [‘last’]

map(‘first’, ’join’, ’last’, ’Doe’)
ARRAY

Ordered sequences of the same type that are inferable using zero-based integers.

Array(‘John’,’Doe’)

Text File Encoding:

     Text files are delimited with commas or tabs which are called CSV’s or TSR’s respectively.

Hive can use those formats but there is a draw back to both formats.

      We have to be careful about commas or tabs embedded in text and not intended as field or column delimiters.

      For this reason, Hive uses various control characters by default, which are less likely to appear in value strings.

Hive uses the term field when overriding the default delimiter.

Delimiter Description
\n Capture 15

For text files, each line is a record so the line feed character separates records.

^ A Capture 15

Separates all fields (Columns) and written using the octal code \001 cohen explicitly specified in CREATE TABLE Statements.

^ B Capture 15

Separate the elements in an ARRAY or STRUCT, or the key-value paid in a MAP written using the octal code\002 when explicitly specified in CREATE TABLE Statements.

^ C Capture 15

Separate the key from the corresponding value

Overriding  default delimiters:-

Table deduction with all the format defaults explicitly specified.

CREATE TABLE Employees(name STRING, Salary FLOAT,
Subordinates ARRAY< STRING >Deduction MAP< STRING, FLOAT>,
Address STRUCT<Street: STRING, City: STRING, State: STRING, Zip: INT> )
ROW FORMAT DELIMITED FIELDS TERNNATED
By’\001’ COLLECTION ITEMS TERMINATED
By’\002’ MAP KEYS TERMINATED By’\003’
LINES TERMINATED By’\n’ STORED AS TEXT FILE;

Capture 15 ROW FORMAT DELIMITED.Sequence of key words must appear before any of the other clauses, with the exception of the STORED AS –Clause.

Capture 15\001 is the octal code for ^ A character to separate fields.

Capture 15 Similarly, /002 is the octal code for ^ B and /003 B the octal code for ^ C

Capture 15 Hive uses the ^B character to separate collection items and ^C character to separate map keys from values.

Capture 15 The clause LINES TERMINATED By() and STORED AS-do not requite the ROW FORMAT DELIMITED Keywords.

EX : Text File for the Table created as above:-

John Doc ^A100000.0 ^A May smith ^B Todd. Jones ^A Fedoral Taxes^C.2
^B State Taxes ^C.05 ^B Insurance ^C.1^A1 Michigan Are. ^B chicage ^BIL ^B 606

 


 

0 Responses on Hadoop Hive Data Types with Examples"

Leave a Message

Your email address will not be published. Required fields are marked *

Copy Rights Reserved © Mindmajix.com All rights reserved. Disclaimer.
Course Adviser

Fill your details, course adviser will reach you.