Hive Installation on Ubuntu 14.04 With MySQL Database Metastore

Apache Hive is a data warehouse infrastructure built on top of Hadoop for providing data summarization, query, and analysis. Hive gives an SQL-like interface to query data stored in various databases and file systems that integrate with Hadoop. The traditional SQL queries must be implemented in the MapReduce Java API to execute SQL applications and queries over a distributed data. Hive provides the necessary SQL abstraction to integrate SQL-like Queries (HiveQL) into the underlying Java API without the need to implement queries in the low-level Java API. Since most of the data warehousing application work with SQL based querying language, Hive supports easy portability of SQL-based application to Hadoop.
Pre Requirements
1) A machine with Ubuntu 14.04 LTS operating system
2) Apache Hadoop 2.6.4 pre installed (How to install Hadoop on Ubuntu 14.04)
3) Apache Hive 2.1.0 Software (Download Here)
Hive Installation With MySQL Database Metastore
NOTE
Hive versions 1.2 onward require Java 1.7 or newer. Hive versions 0.14 to 1.1 work with Java 1.6 as well.
Hadoop 2.x (preferred), 1.x (not supported by Hive 2.0.0 onward). Hive versions up to 0.13 also supported Hadoop 0.20.x, 0.23.x.
Hive Installation on Ubuntu 14.04 With MySQL Database Metastore

Hive Installation Steps
Step 1 - Installing MySQL Server. Open a terminal (CTRL + ALT + T) and type the following sudo command.

$ sudo apt-get install mysql-server

During mysql-server installation it will ask password to enter for root, give it as 'root' or something else. In my case i have given it as root to remember easily.
Step 2 - Installing MySQL Java Connector. This will install libraries (mysql-connector-java.jar) in /usr/share/java/ folder using that we can connect MySQL using Java.

$ sudo apt-get install libmysql-java

Step 3 - Enter into MySQL command line inteface(CLI). Open a terminal (CTRL + ALT + T) and type the following command.

$ mysql -u root -p

Enter password

Enter password :****

Step 4 - Creating new user

mysql> CREATE USER 'hduser'@'%' IDENTIFIED BY 'hduser';

Step 5 - Grant all privileges to new user

mysql> GRANT all on *.* to 'hduser'@localhost identified by 'hduser';

Step 6 - Flush privileges

mysql> flush privileges;

Step 7 - Creating hive directory. Open a new terminal(CTRL + ALT + T) and enter the following command.

$ sudo mkdir /usr/local/hive

Step 8 - Change the ownership and permissions of the directory /usr/local/hive. Here 'hduser' is an Ubuntu username.

$ sudo chown -R hduser /usr/local/hive
$ sudo chmod -R 755 /usr/local/hive

Step 9 - Switch User, is used by a computer user to execute commands with the privileges of another user account.

$ su hduser

Step 10 - Change the directory to /home/hduser/Desktop , In my case the downloaded apache-hive-2.1.0-bin.tar.gz file is in /home/hduser/Desktop folder. For you it might be in /downloads folder check it.

$ cd /home/hduser/Desktop/

Step 11 - Untar the apache-hive-2.1.0-bin.tar.gz file.

$ tar xzf apache-hive-2.1.0-bin.tar.gz

Step 12 - Move the contents of apache-hive-2.1.0-bin folder to /usr/local/hive

$ mv apache-hive-2.1.0-bin/* /usr/local/hive

Step 13 - Edit $HOME/.bashrc file by adding the pig path.

$ sudo gedit $HOME/.bashrc

$HOME/.bashrc file. Add the following lines

export HIVE_HOME=/usr/local/hive
export PATH=$HIVE_HOME/bin:$HIVE_HOME/lib:$PATH

Step 14 - Reload your changed $HOME/.bashrc settings

$ source $HOME/.bashrc

Step 15 - Change the directory to /usr/local/hive/conf

$ cd $HIVE_HOME/conf

Step 16 - Copy the default hive-env.sh.template to hive-env.sh

$ cp hive-env.sh.template hive-env.sh

Step 17 - Edit hive-env.sh file.

$ gedit hive-env.sh

Step 18 - Add the below lines to hive-env.sh file. Save and Close.

export HADOOP_HOME=/usr/local/hadoop
export HIVE_CONF_DIR=$HIVE_CONF_DIR
export HIVE_AUX_JARS_PATH=$HIVE_AUX_JARS_PATH

Step 19 - Copy the default hive-default.xml.template to hive-site.xml

$ cp hive-default.xml.template hive-site.xml

Step 20 - Edit hive-site.xml file.

$ gedit hive-site.xml

Step 21 - Add or update below properties in hive-site.xml file.

Put the following at the beginning of hive-site.xml

  <property>
    <name>system:java.io.tmpdir</name>
    <value>/tmp/hive/java</value>
  </property>
  <property>
    <name>system:user.name</name>
    <value>${user.name}</value>
  </property>

<property>       
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://localhost/metastore?createDatabaseIfNotExist=true</value>
<description>metadata is stored in a MySQL server</description>
</property>

<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
<description>MySQL JDBC driver class</description>
</property>

<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>hduser</value>
<description>user name for connecting to mysql server</description>
</property>

<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>hduser</value>
<description>password for connecting to mysql server</description>
</property>

<property>
<name>hive.metastore.warehouse.dir</name>
<value>hdfs://localhost:9000/user/hive/warehouse</value>
<description>location of default database for the warehouse</description> 
</property>

Step 22 - Remove below property in hive-site.xml file. Save and Close.

<property>
    <name>javax.jdo.option.ConnectionUserName</name>
    <value>APP</value>
    <description>Username to use against metastore database</description>
</property>

Step 23 - Copy mysql-connector-java-5.1.28.jar from /usr/share/java/ to $HIVE_HOME/lib/ folder.

$ cp /usr/share/java/mysql-connector-java-5.1.28.jar $HIVE_HOME/lib/

Step 24 - Change the directory to /usr/local/hadoop/sbin

$ cd /usr/local/hadoop/sbin

Step 25 - Start all hadoop daemons.

$ start-all.sh

Step 26 - You must use below HDFS commands to create /tmp and /user/hive/warehouse (aka hive.metastore.warehouse.dir) and set them chmod g+w before you can create a table in Hive.

$ hdfs dfs -mkdir /tmp

$ hdfs dfs -chmod 777 /tmp

$ hdfs dfs -mkdir /user/hive/warehouse

$ hdfs dfs -chmod g+w /tmp

$ hdfs dfs -chmod g+w /user/hive/warehouse

Step 27 - Change the directory to /usr/local/hive/bin

$ cd $HIVE_HOME/bin

Step 28 - We need to run the schematool command below as an initialization step. For example, we can use "mysql" as db type.

$ schematool -initSchema -dbType mysql

Step 29 - To use the Hive command line interface (CLI) from the shell.

$ hive

Step 30 - To list all the tables those are present in mysql database.

$ show tables;

Step 31 - Enter into MySQL command line inteface(CLI). Open a terminal (CTRL + ALT + T) and type the following command.

$ mysql -u hduser -p

Enter password

Enter password: hduser

Step 32 - Use matastore database.

mysql> use metastore;

Step 33 - To list all the tables those are present in mysql metastore database.

select * from TBLS;

Apache Spark WordCount scala example

Apache Spark is an open source cluster computing framework. Originally developed at the University of California, Berkeley's AMPLab, the Spark codebase was later donated to the Apache Software Foundation, which has maintained it since. Spark provides an interface for programming entire clusters with implicit data parallelism and fault-tolerance. Pre Requirements 1) A machine with Ubuntu 14.04 LTS operating system 2) Apache Hadoop 2.6.4 pre installed ( How to install Hadoop on Ubuntu 14.04 ) 3) Apache Spark 1.6.1 pre installed ( How to install Spark on Ubuntu 14.04 ) Spark WordCount Scala Example Step 1 - Change the directory to /usr/local/spark/sbin. $ cd /usr/local/spark/sbin Step 2 - Start all spark daemons. $ ./start-all. sh Step 3 - The JPS (Java Virtual Machine Process Status Tool) tool is limited to reporting information on JVMs for which it has the access permissions. $ jp...

Big Data Analysis

Search This Blog

Hive Installation on Ubuntu 14.04 With MySQL Database Metastore

Comments

Post a Comment

Popular posts from this blog

Apache Spark WordCount scala example

Hive hiveserver2 and Web UI usage

Apache Spark Shell Usage