How to Insert Data into Tables from Queries

Recommended by 0 users

Inserting Data into Tables from Queries

Capture 15 Insert statement is used to load data into a table from query.

Capture 15 Example for the state of Oregon, where we presume the data is already in another table called as staged- employees

Capture 15 Let us use different names for the country and state fields in staged – employees, calling them cnty.

Ex:- INSERT OVER WRITE TABLE employees PARITION(country=’US’ ,state=’OR’)SELECT* FROM staged-employees se WHERE SE. cnty=’us’ ANO SE.ST=’or’;

Capture 15 With OVERWRITE; previous contents of the partition or whole table are replaced.

Capture 15 If you use INTO instead of OVERWRITE Hive appends the data rather than replacing it and it is available in Hive 0.8.0 version or later.

Capture 15 You can mix INSERT OVER WRITE clauses and INSERT INTO Clauses as Well

Dynamic partition Inserts

 Capture 15 If we have lot of partitions to create, you have to write a lot of SQL.

Capture 15 Hive supports a dynamic partition feature, where it can infer the partitions to create a new based on query parameters.

Capture 15 Example for dynamic partition and  static partition are:

Ex:-INSERT OVER WRITE TABLE employees PAITION(Country, state)
SELECT ----- Se cnty, se-st From staged- employees se;

Capture 15 Hive determines the values of the partition keys, country and state from the last two columns in the SELECT clant

Capture 15 This is why we use different names in staged-employees and also emphasize the relationship between source column values and output partition values.

Capture 15 You can also mix .dynamic and static partitions and the variation of the previous query specifies a static value for the country(US) and a dynamic value for state as in below example.

Ex:- Insert over write table employees PARTITION(country=’us’state)
Select-------,se cnty, se.st FROM staged-employees se WHERE se. cnty=’us’;

Capture 15 The static partition keys must come before the dynamic partition keys.

Capture 15 Dynamic partitioning is not enabled by default.

Capture 15 To enable the dynamic partition, we set the desired properties  as below.

hive>set. hive. exec. dynamic. partition=true;

Capture 15 To enable dynamic partition partitioning, by default A B false.

hive>set. hive. exec dynamic. Partition. Mode=non string;To enable all partitions to be determined dynamically and by default, mode is strict.
hive>set. hive. exec .max. dynamic. Partition. Per code=1000;

Capture 15 max. no of dynamic Partitions that can be created by each mapper and reducer Default as it is 100

hive>set. hive. exec. max. created. files

Capture 15 Max total no. of files that are created globally in the Hadoop counter is used to track the no of files created.

Creating Tables and loading them in one Query:-

Capture 15 You can also create a table and insert query results into one statement.

Ex:-CREATE TABLE ca-employees AS SELECT name, salary, address FROM employees WHERE se. state=’CA’;

Capture 15 This table contains just the name, salary and address column from the employee table records for employees in California.

Capture 15 The schema for the new table is taken from the SELECT clause.

Capture 15 Mainly, this feature is used to extract a convenient sub-set of data from a larger and more unwieldy table.

Querying Data in Hive

1) SELECT —FROM Clause:-

Capture 15 SELECT is the projector operator in SQL FROM Clause. It identifies from which table view or nested query, we must select records.

Ex:-hive>create table filter sal tab  as select empid, emp name, esal from use tab where esal>14000;
hive>select * from filter sal tab;

2) GROUP BY Clauses:-

Capture 15 GROUP BY Statement is often used in conjunction with aggregate functions to group the results set by one or more columns and then perform an aggregation over each group

Ex:-hive>SELECT year(ynd),arg(price-close)FROM stocks
WHERE exchange=’NASDAQ’ ANO Symbol=’AAPL’ Group by year(ymd);

3) HAVING Clauses:-

Capture 15 The HAVING Clauses is user to constrain groups produced by GROUP in such a way that it could be expressed with a sub query.

Capture 15 The previous query with an additional HAVING clause that limits the results to years, where the average closing price was greater than $500 as below

Ex:- hive>SELECT year(ymd),avg(price-close)FROM Stocks WHERE exchange=’NASDAQ’ AND Symbol=’AAPL’ Group by year(ymd) HAVING avg(price-close)>50.0;

4)  JOINS:-

Capture 15 Hive supports the classes SQL JOIN Statement, but only equi-joins are supported.

Inner Join:

Capture 15 In an inner join, records are discarded unless join criteria finds matching records in every table being joined.

Ex:-Select a-ymd, a price- close, b. price-close FROM stores a join  b on a. ymd=b.ymd WHERE a. symbol=’AAPL’ and b.symbol =’IRM’;

Capture 15 On clerk specifies the conditions for joining records b/w the two tables.

JOIN Optimizations

Capture 15 Hive can apply an optimization where it joins all three tables in a single mapreduce job.

Capture 15 When joining three or more tables, if every clause uses the same join key, a single map reduce job will be used.

Capture 15 Hive also assumes that the last table in the query be the largest table and attempts to buffer the other tables and then stream the last table through, while performing joins on individual records.

Capture 15 We should switch the positions of stocks and dividends

Ex:-Select s. ymd, s. symbol, s-price-close, d. dividend From dividends d join stockless s on s. ymd= d.ymd
AND s. symbol=d. symbol where s. symbol=’AAPL’;

Capture 15 You don’t have to put the largest table last in the query. Hive also provides a hint mechanism to tell the query optimizer about which table should be streamed.

Ex:- Select /*STREAM TABLE(S)*/s.ymd . s. symbol s.price-close, d.   from stocks  s  Join dividends d on=d. symbol WHERE s.symbol=’apl’

Capture 15 Now, Hive will attempt to stream the stocks table; even though its not the last table in the query.


Capture 15 The left-outer join is indicated by adding the LEFT OUTER key words.

Ex:- hive>SELECT s.ymd, s. symbol ;s. price-close, d. dividend
From stocks s LEFT OUTER JOIN dividends d on
s.ymd= d.ymd AND s. symbol=d. symbol WHERE

Capture 15 In this join, all the records from the left hand table that match the WHERE clause are returned.


Capture 15 Right outer joins returns all the records in the right hand table that match the WHERE clause and null B used for fields of missing records in the left hand table.

Ex:-SELECT s.ymd, s. symbol, s. price-close d. dividend From dividends d RIGHT OUTER JOIN stocks s ON
d.ymd=s.ymd AND d.symbol= s. symbol WHERE


Capture 15 A left semi-join returns records from the left hand table if records are found in the right hand table that the ON predicates.

Capture 15 It is a special optimized case of the more general inner join.

Ex:-hive>SELECT s.ymd, s. symbol, s. price – close From stocks s LEFT SEMI JOIN dividends d On
s.ymd=d.ymd AND s.symbol=d.symbol;

Full outer join:-

Capture 15 Resume all records from all the tables that match the WHERE clause.

Ex:- hive>SELECT s.ymd, s. symbol, s. price –close, d. dividend From dividends d Full outer join struck s ON
d.ymd=s.ymd and d.symbol=s.symbol
WHERE s.symbol=’APPL’;

Cartesian product Joins:-

Capture 15 In Hive, the below query computes the full Cartesian product before applying the WHERE clause and it will take long time to finish.

Capture 15 When the property hive.Mapred.mode is set to strict, Hive presents users by inadvertently issuing a Cartesian product query.

Ex:- Hive>SELECT* From stocks JOIN dividends
WHERE . stock. Symbol =dividends . symbol and
Stock. Symbol = ‘AAPL’;

Map- side Joins:-

Capture 15 Hive can do all map side joins, since it can look up every possible match against the small tables in memory there by eliminating the reduce step required in the more common join scenarios.

Capture 15 On smaller data sets, this optimization is faster that the normal join.

Ex:-SELECT/*+MAP JOIN(d)*/ s.ymd, s. symbol, s.price-close,
d. dividend from stocks s JOIN dividends d on
s- ymd=d.ymd AND s. symbol=d. symbol WHERE

Capture 15 The hint still works, but its now deprecated as of Hive 0-7.0

Capture 15 However, you still have to set a property to menu as  below before Hive attempts the optimization.

hive>set hit auto .convert. joins –true;
hive>SELECT s.ymd. s. symbol, s. price-close, d-dividend
FROM stocks s join dividends d On s.ymd=d.ymd
AND s. symbol=d. symbol where s. symbol =’AAPL’;

Capture 15 Map joins can take advantage of bucketed tables, since a mapper working on a bucket of the left table only needs to load the corresponding buckets of the right table to perform the join.

Capture 15 However, you need to enable the optimization with the problem

hive> set Hive. optimize. Bucket map join =true;


Capture 15 ORDER By clause is familiar to other SQL dialects.

Capture 15 It performs total ordering of the query result set

Capture 15 This means that all the data B passed through a single reducer which take long time to execute larger data sets.

Capture 15 Hive adds an alternate, SORT By that orders the data only within each reducer, there by performing a local ordering.

Example for ORDER By:-

hive> SELECT s.ymd, s.symbol, s.price-close FROM stocks s
ORDER B s.ymd Asc, s. symbol DESC;

Example for SORT By:-

hive>SELECT s.ymd, s. sym bol, s.price-close From stock s Sort by s.ymd Asc, s. symbol DESC;

Capture 15 The two queries look almost identical, but if more than one reduce is invoked, the output will be sorted differently.

Capture 15 While each reducer output files are sorted, the data will probably overlap with the output of other reducers.

Capture 15 Hive will require a LIMIT clause with ORDER By, if the property Hive is mapped. Mode B is set to strict.


Capture 15 Distribute by controls how a map output must be divided among reduces.

Capture 15 All data that flows through a map Reduce job is organized into key- value pairs.

Capture 15 Hive must use sub feature internally.

Capture 15 We can use DISTRIBUTE BY to ensure that the records for each stock symbol go to the save reducer, then use SORT By to order the data the way we want.

Capture 15 The following query demonstrates this technique:

hive>SELECT s.ymd, s.sym bol s. price –close FROM stocks
DISTRIBUTE By s. symbol sort by s. symbol Asc, s.ymd asc;


Capture 15 CLUSTER BY is the short way of expressing the Distribute by with sort by


Capture 15 In the previous example., s. symbol col was used in the DISTRIBUTE By clause and s. symbol and s. ymd columns in the SORT By clause.

Ex:-hive>SELECT s.ymd, s. symbol, s. price-close FROM stock s cluster BY s.symbol;


Capture 15UNION ALL combines two or more tables.

Capture 15 Each sub query of the union query must produce the same number of columns and for each column, its type must match all the column types in the same position.

Capture 15 For example, if the second column is a false, then the second column of all the other query results must be a FLOAT.

Capture 15 Below Example shows the merging of log data

SELECT log.ymd, log. Level.log. message
FROM(SELECT L1.ymd,L1. Level;L1. Message,’log1’ As source FROM log1 L1UNION ALL SELECT L2.ymd, L2.leve, L2.message,’log2’As source FROM log1 l2) log SORT By log.ynd ASC;

Capture 15 UNION may be used when a clause selects from the same source table.

Exporting Data:-

Capture 15 If the data files are already formatted the way you want, then it is simple enough to copy the directors or files.

Cmd# hadoop fs-cp source- path target-path

Capture 15 other wise, you can use INSERT—Directory as in this Example:

hive>INSERT OVER WRITE LOCAL DIRECTORY ‘/tmp/ca-employees’
SELECT Name, salary, address FROM employees WHERE

Capture 15 One or more files will be written to /fmp/ca-employees; depending on the number of reducers invoked.

0 Responses on How to Insert Data into Tables from Queries"

Leave a Message

Your email address will not be published. Required fields are marked *

Copy Rights Reserved © Mindmajix.com All rights reserved. Disclaimer.
Course Adviser

Fill your details, course adviser will reach you.