Hive hiveQL examples
Hive is a database technology that can define databases and tables to analyze structured data. The theme for structured data analysis is to store the data in a tabular manner, and pass queries to analyze it. This post explains how to create Hive database and hiveQL queries. Hive contains a default database named default.
Step 1 - Change the directory to /usr/local/hive/bin
Step 2 - Enter into hive shell
Database
Create Database is a statement used to create a database in Hive. The usage of SCHEMA and DATABASE are same.
OR
The following query is used to verify a databases list
Table
Create Table is a statement used to create a table in Hive.
Create sample.txt file
Add following lines to sample.txt file. Save and close.
Load
We can insert data using the LOAD DATA statement. While inserting data into Hive, it is better to use LOAD DATA to store bulk records. There are two ways to load data: one is from local file system and second is from Hadoop file system.
Verify
Alter
How to alter the attributes of a table such as changing its table name, changing column names, adding columns, and deleting or replacing columns.
Add Columns
Replace Columns
Partitioning
Hive organizes tables into partitions. It is a way of dividing a table into related parts based on the values of partitioned columns such as date, city, and department. Using partition, it is easy to query a portion of the data. Tables or partitions are sub-divided into buckets, to provide extra structure to the data that may be used for more efficient querying. Bucketing works based on the value of hash function of some column of a table.
Built-in operators of Hive
There are four types of operators in Hive
1. Relational Operators
2. Arithmetic Operators
3. Logical Operators
4. Complex Operators
View
How to create and manage views. Views are generated based on user requirements. You can save any result set data as a view. The usage of view in Hive is same as that of the view in SQL.
Index
An Index is nothing but a pointer on a particular column of a table. Creating an index means creating a pointer on a particular column of a table.
HiveQL Syntax
Select and Where Query
Create a employee.txt file
Add following lines to employee.txt file. Save and close.
Create a table called employee
Load employee.txt data into table employee.
OrderBy
The ORDER BY clause is used to retrieve the details based on one column and sort the result set by ascending or descending order.
GroupBy
The GROUP BY clause is used to group all the records in a result set using a particular collection column. It is used to query a group of records.
Drop a table
View tables
Drop a database
OR
View databases
Joins
JOINS is a clause that is used for combining specific fields from two tables by using values common to each one. It is used to combine records from two or more tables in the database. It is more or less similar to SQL JOINS.
Create a table customers
Create a customers.txt file
Add following lines to customers.txt file. Save and close.
Load customers.txt data into customers table.
Create a table orders
Create a orders.txt file
Add following lines to customers.txt file. Save and close.
Load orders.txt data into orders table.
JOIN
LEFT OUTER JOIN
RIGHT OUTER JOIN
FULL OUTER JOIN
Please share this blog post and follow me for latest updates on
Hive is a database technology that can define databases and tables to analyze structured data. The theme for structured data analysis is to store the data in a tabular manner, and pass queries to analyze it. This post explains how to create Hive database and hiveQL queries. Hive contains a default database named default.
Step 1 - Change the directory to /usr/local/hive/bin
$ cd $HIVE_HOME/bin
$ hive
Create Database is a statement used to create a database in Hive. The usage of SCHEMA and DATABASE are same.
hive> CREATE DATABASE [IF NOT EXISTS] userdb;
hive> CREATE SCHEMA userdb;
hive> SHOW DATABASES;
Create Table is a statement used to create a table in Hive.
CREATE TABLE IF NOT EXISTS employee ( eid int, name String, > salary String, destination String) > COMMENT 'Employee details' > ROW FORMAT DELIMITED > FIELDS TERMINATED BY ' ' > LINES TERMINATED BY '\n' > STORED AS TEXTFILE;
sample.txt
1201 Gopal 45000 Technical manager 1202 Manisha 45000 Proof reader 1203 Masthanvali 40000 Technical writer 1204 Krian 40000 Hr Admin 1205 Kranthi 30000 Op Admin
We can insert data using the LOAD DATA statement. While inserting data into Hive, it is better to use LOAD DATA to store bulk records. There are two ways to load data: one is from local file system and second is from Hadoop file system.
LOAD DATA LOCAL INPATH '/home/hduser/Desktop/HIVE/sample.txt' OVERWRITE INTO TABLE employee;
select * from employee;
How to alter the attributes of a table such as changing its table name, changing column names, adding columns, and deleting or replacing columns.
hive> ALTER TABLE employee RENAME TO emp; hive> ALTER TABLE employee CHANGE name ename String; hive> ALTER TABLE employee CHANGE salary salary Double;
hive> ALTER TABLE employee ADD COLUMNS ( > dept STRING COMMENT 'Department name');
hive> ALTER TABLE employee REPLACE COLUMNS (ename String name String);
Hive organizes tables into partitions. It is a way of dividing a table into related parts based on the values of partitioned columns such as date, city, and department. Using partition, it is easy to query a portion of the data. Tables or partitions are sub-divided into buckets, to provide extra structure to the data that may be used for more efficient querying. Bucketing works based on the value of hash function of some column of a table.
hive> ALTER TABLE employee > ADD PARTITION (year=̢۪2013̢۪) > location '/2013/part2013'; hive> ALTER TABLE employee > ADD PARTITION (year=̢۪2012̢۪) > location '/2012/part2012'; hive> ALTER TABLE employee PARTITION (year=̢۪2013̢۪) > RENAME TO PARTITION (Yoj=̢۪2013̢۪); ALTER TABLE employee DROP [IF EXISTS] > PARTITION (year=̢۪2013̢۪);
There are four types of operators in Hive
1. Relational Operators
2. Arithmetic Operators
3. Logical Operators
4. Complex Operators
SELECT * FROM employee WHERE Id=1205;
SELECT * FROM employee WHERE Salary>=40000;
SELECT 20+30 ADD FROM temp;
SELECT * FROM employee WHERE Salary>40000 && Dept=TP;
How to create and manage views. Views are generated based on user requirements. You can save any result set data as a view. The usage of view in Hive is same as that of the view in SQL.
hive> CREATE VIEW emp_30000 AS SELECT * FROM employee WHERE salary>30000;
select * from emp_30000;
hive> DROP VIEW emp_30000;
An Index is nothing but a pointer on a particular column of a table. Creating an index means creating a pointer on a particular column of a table.
hive> CREATE INDEX inedx_salary ON TABLE employee(salary) AS 'org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler';
hive> DROP INDEX index_salary ON employee;
SELECT [ALL | DISTINCT] select_expr, select_expr, ... FROM table_reference [WHERE where_condition] [GROUP BY col_list] [HAVING having_condition] [CLUSTER BY col_list | [DISTRIBUTE BY col_list] [SORT BY col_list]] [LIMIT number];
Create a employee.txt file
employee.txt
1201 Gopal 45000 TechnicalManager TP 1202 Manisha 45000 ProofReader PR 1203 Masthanvali 40000 TechnicalWriter TP 1204 Krian 40000 HrAdmin HR 1205 Kranthi 30000 OpAdmin Admin
hive> CREATE TABLE IF NOT EXISTS employee (id int, name String, salary String, destination String, dept String) COMMENT 'Employee details' ROW FORMAT DELIMITED FIELDS TERMINATED BY ' ' LINES TERMINATED BY '\n' STORED AS TEXTFILE;
hive> LOAD DATA LOCAL INPATH '/home/hduser/Desktop/HIVE/employee.txt' OVERWRITE INTO TABLE employee;
hive> SELECT * FROM employee WHERE salary>30000;
The ORDER BY clause is used to retrieve the details based on one column and sort the result set by ascending or descending order.
SELECT Id, Name, Dept FROM employee ORDER BY DEPT;
The GROUP BY clause is used to group all the records in a result set using a particular collection column. It is used to query a group of records.
SELECT Dept,count(*) FROM employee GROUP BY DEPT;
hive> DROP TABLE IF EXISTS employee;
hive> SHOW TABLES;
hive> DROP DATABASE IF EXISTS userdb;
hive> DROP DATABASE IF EXISTS userdb CASCADE;
hive> DROP SCHEMA userdb;
hive> SHOW databases;
JOINS is a clause that is used for combining specific fields from two tables by using values common to each one. It is used to combine records from two or more tables in the database. It is more or less similar to SQL JOINS.
Create a table customers
hive> CREATE TABLE IF NOT EXISTS customers (id int, name String, age int, address String, salary String) COMMENT 'customer details' ROW FORMAT DELIMITED FIELDS TERMINATED BY ' ' LINES TERMINATED BY '\n' STORED AS TEXTFILE;
customers.txt
1 Ramesh 32 Ahmedabad 2000.00 2 Khilan 25 Delhi 1500.00 3 kaushik 23 Kota 2000.00 4 Chaitali 25 Mumbai 6500.00 5 Hardik 27 Bhopal 8500.00 6 Komal 22 MP 4500.00 7 Muffy 24 Indore 10000.00
LOAD DATA LOCAL INPATH '/home/hduser/Desktop/HIVE/customers.txt' OVERWRITE INTO TABLE customers;
CREATE TABLE IF NOT EXISTS orders (oid int, odate String, id int, amount String) COMMENT 'order details' ROW FORMAT DELIMITED FIELDS TERMINATED BY ' ' LINES TERMINATED BY '\n' STORED AS TEXTFILE;
orders.txt
102 2009-10-08-00:00:00 3 3000 100 2009-10-08-00:00:00 3 1500 101 2009-11-20-00:00:00 2 1560 103 2008-05-20-00:00:00 4 2060
LOAD DATA LOCAL INPATH '/home/hduser/Desktop/HIVE/orders.txt' OVERWRITE INTO TABLE orders;
SELECT c.ID, c.NAME, c.AGE, o.AMOUNT FROM CUSTOMERS c JOIN ORDERS o ON (c.id = o.id);
SELECT c.ID, c.NAME, o.AMOUNT, o.ODATE FROM CUSTOMERS c LEFT OUTER JOIN ORDERS o ON (c.ID = o.ID);
SELECT c.ID, c.NAME, o.AMOUNT, o.ODATE FROM CUSTOMERS c RIGHT OUTER JOIN ORDERS o ON (c.ID = o.ID);
SELECT c.ID, c.NAME, o.AMOUNT, o.ODATE FROM CUSTOMERS c FULL OUTER JOIN ORDERS o ON (c.ID = o.ID);
Comments
Post a Comment