Hadoop Data Types with Examples

Hadoop Articles

Hadoop Quiz

Test and Explore your knowledge

fdHadoop supports various data types for defining column or field types in Hive tables. In this blog, we will help you gain a strong knowledge of Hadoop Hive data types with detailed examples.

Majorly, Hadoop Data Types are categorized into five types as:

Enroll in our Hadoop Certification course today and develop a strong foundation in Big Data

Hadoop Data Types

#1. Primitive Data Types

These data types and their sizes are similar to SQL/Java primitive data types and sizes.

Primary Data Types are further divided into four types as:

Numeric Data Type
Date/Time Data Type
String Data Type
Miscellaneous Data Type.

i) Numeric Data Type

It supports both integral and floating data types. These are equivalent to Java’s primitive types.

Below chart explains all the numeric types with their sizes and their ranges.

Integral Data Type:

Type	Size	Range
TINYINT	1 BYTE	-128 to 127
-128 to 127	2 BYTE	-32, 768 to 32, 767
INT	4 BYTE	–2,147,483,648to 2,147,483,647
BIGINT	8 BYTE	–9,223,372,036,854,775,808 to 9,223,372,036,854,775,807

Checkout Hadoop Tutorial

Floating Data Type:

Type	Size
FLOAT	4 BYTE
DOUBLE	8 BYTE
DECIMAL	17 BYTE

ii) Date/Time Data Type

It supports timestamp, date, and interval data types. Hive provides Timestamp and Date data types to UNIX timestamp format.

TIMESTAMP- It uses nanosecond precision and is denoted by yyyy-mm-dd hh:mm: ss format.
DATE- These are represented as YYYY-MM-DD
INTERVAL

The below table shows how to cast string and timestamp values to Date format:

Cast Type	Result
cast(date as date)	Same date value
cast(date as string)	Date formatted to ‘YYYY-MM-DD’.
cast(string as date)	Midnight of the year/month/day of the date value is returned as a timestamp.
cast(date as timestamp)	If the string is in the form of ‘YYYY-MM-DD’, then a date value related to that is returned. If the string value does not match this format, then NULL is returned.
cast(timestamp as date)	The year/month/day of the timestamp is returned as a date value.

iii) String Data Type

It supports string, varchar, and char data types.

STRING (Unbounded variable-length character string) - Either single or double quotes can be used to enclose characters.
VARCHAR (Variable-length character string) - Maximum length is specified in braces and allowed up to 65355 bytes.
CHAR (Fixed-length character string) - ‘m’ or “m”

iv) Miscellaneous Data Type

Miscellaneous data types support both boolean and binary data types.

BOOLEAN - It stores True or false values
BINARY - It is an array of bytes.

Frequently asked Hadoop Interview Questions

#2. Complex Data Types

In Hive, complex data types are further classified into Array, MAP, STRUCT, and UNION.

i) ARRAY:

an Ordered collection of similar types of fields that are indexable using zero-based integers. (These are just like arrays in java).

Syntax: ARRAY<data_type>

E.g. array (5, 6)

ii) MAP:

Unordered collection of key-value pairs. Keys may be values, primitives or any type.

For a specific map, the keys and values must be the same type.

Syntax: MAP<primitive_type, data_type>

E.g. MAP(‘m’, 8, ‘n’, 9).

iii) STRUCT:

Collection of named fields. This fields may be of various types. (It's similar to STRUCT in C language)

Syntax: STRUCT<col_name : data_type [COMMENT col_comment],…..>

E.g. struct(‘m', 1 1.0),[n] named_struct(‘col1', ‘m', ‘col2', 1, ‘col3', 1.0)

iv) UNION:

Union type can hold any data type that may be one of the specified data types. (It’s similar to Unions in C)

Syntax: UNIONTYPE<data_type, data_type, …>

E.g. create_union(5, ‘b', 60)

#3. Columns

In Hive, columns support integral type, string, timestamp, date, decimal, and union data types.

i) Integral type:

By default, the integral type is considered as int unless the range of the number exceeds.

INT/INTEGER
SMALLINT- Its data range is less than the range of INT.
BIGINT- Its data range exceeds the range of INT.
TINYINT- Its data range is less than SMALLINT

Table	Postfix	Example
BIGINT	L	100L
SMALLINT	S	100S
TINYINT	Y	100Y

ii) Strings:

These are represented with either single(‘) or double(“) quotes and classified as:

VARCHAR- These are built with a length specifier (1-65535). It represents the maximum number of characters passed in the character string.
Char- These are same as VARCHAR but of fixed-length. Maximum fixed length is 255.

iii) Timestamp:

It maintains traditional UNIX timestamp with operational nanosecond precision.

The supported Timestamp format is ”YYYY-MM-DD HH:MM: SS.(.f...)” in text files.

Timestamps are further categorized as:

Integer numeric type	UNIX timestamp in seconds
Floating-point numeric type	UNIX timestamp in seconds with decimal precision
String	Supports java.sql.Timestamp
	format "YYYY-MM-DD HH:MM: SS.fffffffff"

iv) Dates:

DATE value represents a particular year/month/day as YYYY-MM-DD format. For eg: DATE ‘2019-08-02’

It does not contain a time of day time component. It supports a range of 0000-01-01 to 9999-12-31.

v) Decimals:

The DECIMAL type is similar to Java’s Big Decimal format which denotes immutable arbitrary precision.

Apache Hive 0.11 and 0.12 have the precision of the DECIMAL type fixed and defined to 38 digits.
Apache Hive 0.13 users can specify scale and precision while designing tables with the DECIMAL datatype by a syntax DECIMAL(precision, scale).
If precision is not specified its default value is 10, and if the scale is not specified its default value is 0.

not specified its default value is 0.

Eg:

CREATE TABLE foo (

a DECIMAL, -- Defaults to decimal(10,0)

b DECIMAL(10, 6)

)

vi) Union Types:

Union Hive data types are the set of independent data types. We can create an instance of this type by create_union UDF.

#4. Literals

Literals support both floating-point types and decimal types.

Floating Point Types - These are assumed to be DOUBLE data types in the Hive.
Decimal Types - These are assumed to be higher value than float point value with a range more than DOUBLE data type. It ranges between -10-308 to 10308.

#5. Null Values

In Hive, missing values are denoted by the specific value called NULL.

Conclusion

Therefore, on this page, we have stated all the data types clearly. Hope this information will be helpful for you to gain a firm base in Hadoop Hive data types.

Explore Hadoop Sample Resumes! Download & Edit, Get Noticed by Top Employers!Download Now!

List of Big Data Courses:

Hadoop Administration	MapReduce
Big Data On AWS	Informatica Big Data Integration
Bigdata Greenplum DBA	Informatica Big Data Edition
Hadoop Hive	Impala
Hadoop Testing	Apache Mahout

On-Job Support Service

Online Work Support for your on-job roles.

@Learner@SME

Our work-support plans provide precise options as per your project tasks. Whether you are a newbie or an experienced professional seeking assistance in completing project tasks, we are here with the following plans to meet your custom needs:

Pay Per Hour
Pay Per Week
Monthly

Learn MoreContact us

Course Schedule

Name	Dates
Hadoop Training	Aug 02 to Aug 17	View Details
Hadoop Training	Aug 05 to Aug 20	View Details
Hadoop Training	Aug 09 to Aug 24	View Details
Hadoop Training	Aug 12 to Aug 27	View Details

Last updated: 04 Apr 2023

About Author

Madhuri Yerukala

Madhuri is a Senior Content Creator at MindMajix. She has written about a range of different topics on various technologies, which include, Splunk, Tensorflow, Selenium, and CEH. She spends most of her time researching on technology, and startups. Connect with her via LinkedIn and Twitter .

read less

Recommended Courses

Denodo Training

4.6

532

Elasticsearch Training

4.6

824

1 / 15