Apache Hive is a data warehouse infrastructure built on top of
Hadoop for providing data summarization, query, and analysis. Hive
gives an SQL-like interface to query data stored in various databases
and file systems that integrate with Hadoop. The traditional SQL queries
must be implemented in the MapReduce Java API to execute SQL
applications and queries over a distributed data. Hive provides the
necessary SQL abstraction to integrate SQL-like Queries (HiveQL) into
the underlying Java API without the need to implement queries in the
low-level Java API. Since most of the data warehousing application work
with SQL based querying language, Hive supports easy portability of
SQL-based application to Hadoop.
Pre Requirements
1) A machine with Ubuntu 14.04 LTS operating system
2) Apache Hadoop 2.6.4 pre installed (How to install Hadoop on Ubuntu 14.04)
3) Apache Hive 2.1.0 pre installed (How to Install Hive on Ubuntu 14.04)
Hive Command Line Interface (CLI) Usage
Step 1 - Change the directory to /usr/local/hive/bin
CLI help command
To use default database
To execute hiveQL queries
To start remote hiveserver at port number 10000
To execute hiveQL script that is in local file system.
To execute hiveQL script that is in HDFS.
Syntax
Please share this blog post and follow me for latest updates on
Pre Requirements
1) A machine with Ubuntu 14.04 LTS operating system
2) Apache Hadoop 2.6.4 pre installed (How to install Hadoop on Ubuntu 14.04)
3) Apache Hive 2.1.0 pre installed (How to Install Hive on Ubuntu 14.04)
Hive Command Line Interface (CLI) Usage
Step 1 - Change the directory to /usr/local/hive/bin
$ cd /usr/local/hive/bin
$ hive --service cli -help
To use default database
$ hive --database default;
$ hive -e 'select * from word_counts'
$ hive -h remotehiveserver -p 10000
$ hive -f /home/user/test.hql
$ hive -f hdfs://localhost:9000/user/hduser/wordcount.hql
$ hive -e 'sql queries'
$ hive -f 'filepath'
Comments
Post a Comment