If no database is specified, tables belong to the default Data Base.
Hive table is logically made up of the data being stored and the associated metadata describing the layout of the data in the table.
have>CREATE TABLE EMP (empid int, ename string, esal double) ROW FORMAT DELIMITED FIELDS TERMINATED By ‘t’ LINES TERMINATED by ‘n’ STORED AS TEXT FILE;
To display the description of the table we use have>desc emp;
To have, we are having two types of tables
Managed tables are the one which will be managed in the Hive warehouse i.e. whenever we create a managed table definition, it will be stored under the default location of the Hive warehouse i.e./user/Hive/warehouse.
Syntax for creating Hive managed table:-
Hive>create table manage- tab (empid, ename string, esal int) row format delimited fields terminated by ‘t’ lines terminated by ‘m’ stored as a text file;
#hadoop fs –ls/user/Hive/warehouse.
We can load the data in two ways
In local mode, the syntax is
hive>load data local in path’/home/new Batch/input1.txt’ Into table managed-tab;
For HDFS mode, the syntax is
hive>load data in path’/user/ramesh/Hive/input2.txt’ Into table managed – tab;
Once the successful loading of the table and once the file is loaded, the file will be deleted in HDFS path and we can see in use/Hive/ware house
Along with the managed tables, Hive also uses external tables.
Whenever the key word ‘external’ comes in the table definition part. A hive will not bother about the table definition, i.e. the external table will not be managed by the Hive warehouse system.
Along with the external keyword, we can also mention the ‘location’ in the table definition, where exactly the table definition will get stored.
When you drop an external table, Hive leaves the data untouched and only delete the meta data.
Syntax:-
Hive>create external table external- tab(empid int, ename string, esal double) row format delimited fields Terminated by ‘f’ lines terminated by ‘n’ stored as text file location ‘userRameshHive-external’;
A location will be automatically created.
Hive>load data in path’/Ramesh/input data.txt’ into table external-tab;
Frequently asked Hadoop Interview Questions
One of the main differences between managed and external tables in Hive is that when an external table is dropped, the data associated with it does not get deleted from only the meta data (no. of cols, types of cols, terminators etc.) gets dropped form the Hive meta store
Hive>Create external table log in for tab(log id int, log Error string,Log error count int) row format delimited fields terminated by’f’ stored as text file location ‘user/external location’; Hive>select*from log in for tab;
We get the result from the file which we specified in the location path
Cmd: ALTER TABLE log _ messages RENAME To logmsgs;
You can rename a column, change its position, type or comment.
Syntax:
ALTER TABLE log-messages CHANCE COLUMN hms hours-minutes-
Seconds INT COMMENT ’The hours, minutes and seconds are part of the times tamp’ AFTER Severity;
You can add new columns to the end of the existing columns, before any partition.
Example: ALTER TABLE Log-message ADD COLUMNS(app-name String COMMENT” Application Name” ,session-id long);
The replace statement can only be used with tables that use one of the native ser De modules are Dynamic Ser De or Metadata Type column set ser De.
Ex:-ALTER TABLE log- messages REPLACE COLUMNS( Hours-mins-sees INT Severity STRING Message String);
This statement effectively renames the original hms column and removes the server and process – id columns from the original schema definition.
As for all the ALTER Statements, only the table metadata is changed.
Related Page: Hadoop Archive files in HDFS
Hive organizes tables into partitions, a way of dividing a table into course – grained parts based on the value of a partition column, such as date.
Ex:-
hive>create table party table(loaded int, log error string) PARTITIONED BY (Logdt string, country string) row format delimited field terminated by ‘t’ lines terminated by ‘n’ stored as text file
There are two reasons why you might want to organize your tables (or partitions) into buckets.
Example:-
Tell Hive that a table should be bucketed. And we use the CLUSTERED By clause to specify the columns to a bucket and the number of buckets
hive>CREATE TABLE bucketed users(id INT, name STRINA) CLUSTERED BY (id)INTO 4 BUCKETS;
Here we are using the user ID to determine the bucket the Hive does which is done by hashing the value and reducing module and the number of buckets, so any particular bucket will effectively have a random set of users in it.
Syntax for delving that a table has sorted buckets is:
have>CREATE TABLE bucketed users(id INT, name STRING) CLUSTERED By(id)SORTED By(id ASC)INTO 4 BUCKETS;
Hadoop Administration | MapReduce |
Big Data On AWS | Informatica Big Data Integration |
Bigdata Greenplum DBA | Informatica Big Data Edition |
Hadoop Hive | Impala |
Hadoop Testing | Apache Mahout |
Our work-support plans provide precise options as per your project tasks. Whether you are a newbie or an experienced professional seeking assistance in completing project tasks, we are here with the following plans to meet your custom needs:
Name | Dates | |
---|---|---|
Hadoop Training | Nov 26 to Dec 11 | View Details |
Hadoop Training | Nov 30 to Dec 15 | View Details |
Hadoop Training | Dec 03 to Dec 18 | View Details |
Hadoop Training | Dec 07 to Dec 22 | View Details |
Ravindra Savaram is a Technical Lead at Mindmajix.com. His passion lies in writing articles on the most popular IT platforms including Machine learning, DevOps, Data Science, Artificial Intelligence, RPA, Deep Learning, and so on. You can stay up to date on all these technologies by following him on LinkedIn and Twitter.