Hadoop supports various data types for defining column or field types in Hive tables. In this blog, we will help you gain a strong knowledge of Hadoop Hive data types with detailed examples.
Majorly, Hadoop Data Types are categorized into five types as:
Enroll in our Hadoop Certification course today and develop a strong foundation in Big Data.
These data types and their sizes are similar to SQL/Java primitive data types and sizes.
Primary Data Types are further divided into four types as:
It supports both integral and floating data types. These are equivalent to Java’s primitive types.
Below chart explains all the numeric types with their sizes and their ranges.
Integral Data Type:
|TINYINT||1 BYTE||-128 to 127|
|-128 to 127||2 BYTE||-32, 768 to 32, 767|
|INT||4 BYTE||–2,147,483,648to 2,147,483,647|
|BIGINT||8 BYTE||–9,223,372,036,854,775,808 to 9,223,372,036,854,775,807|
Floating Data Type:
It supports timestamp, date, and interval data types. Hive provides Timestamp and Date data types to UNIX timestamp format.
The below table shows how to cast string and timestamp values to Date format:
|cast(date as date)||Same date value|
|cast(date as string)||Date formatted to ‘YYYY-MM-DD’.|
|cast(string as date)||Midnight of the year/month/day of the date value is returned as a timestamp.|
|cast(date as timestamp)||If the string is in the form of ‘YYYY-MM-DD’, then a date value related to that is returned. If the string value does not match this format, then NULL is returned.|
|cast(timestamp as date)||The year/month/day of the timestamp is returned as a date value.|
It supports string, varchar, and char data types.
Miscellaneous data types support both boolean and binary data types.
In Hive, complex data types are further classified into Array, MAP, STRUCT, and UNION.
an Ordered collection of similar types of fields that are indexable using zero-based integers. (These are just like arrays in java).
E.g. array (5, 6)
Unordered collection of key-value pairs. Keys may be values, primitives or any type.
For a specific map, the keys and values must be the same type.
Syntax: MAP<primitive_type, data_type>
E.g. MAP(‘m’, 8, ‘n’, 9).
Collection of named fields. This fields may be of various types. (It's similar to STRUCT in C language)
Syntax: STRUCT<col_name : data_type [COMMENT col_comment],…..>
E.g. struct(‘m', 1 1.0),[n] named_struct(‘col1', ‘m', ‘col2', 1, ‘col3', 1.0)
Union type can hold any data type that may be one of the specified data types. (It’s similar to Unions in C)
Syntax: UNIONTYPE<data_type, data_type, …>
E.g. create_union(5, ‘b', 60)
In Hive, columns support integral type, string, timestamp, date, decimal, and union data types.
By default, the integral type is considered as int unless the range of the number exceeds.
These are represented with either single(‘) or double(“) quotes and classified as:
It maintains traditional UNIX timestamp with operational nanosecond precision.
The supported Timestamp format is ”YYYY-MM-DD HH:MM: SS.(.f...)” in text files.
Timestamps are further categorized as:
|Integer numeric type||UNIX timestamp in seconds|
|Floating-point numeric type||UNIX timestamp in seconds with decimal precision|
|format "YYYY-MM-DD HH:MM: SS.fffffffff"|
DATE value represents a particular year/month/day as YYYY-MM-DD format. For eg: DATE ‘2019-08-02’
It does not contain a time of day time component. It supports a range of 0000-01-01 to 9999-12-31.
The DECIMAL type is similar to Java’s Big Decimal format which denotes immutable arbitrary precision.
not specified its default value is 0.
CREATE TABLE foo (
a DECIMAL, -- Defaults to decimal(10,0)
b DECIMAL(10, 6)
Union Hive data types are the set of independent data types. We can create an instance of this type by create_union UDF.
Literals support both floating-point types and decimal types.
In Hive, missing values are denoted by the specific value called NULL.
Therefore, on this page, we have stated all the data types clearly. Hope this information will be helpful for you to gain a firm base in Hadoop Hive data types.
|Hadoop Training||Jun 25 to Jul 10|
|Hadoop Training||Jun 28 to Jul 13|
|Hadoop Training||Jul 02 to Jul 17|
|Hadoop Training||Jul 05 to Jul 20|
Madhuri is a Senior Content Creator at MindMajix. She has written about a range of different topics on various technologies, which include, Splunk, Tensorflow, Selenium, and CEH. She spends most of her time researching on technology, and startups. Connect with her via LinkedIn and Twitter .