Skip to main content

Hive Installation With Pre Built Derby Database

Hive versions 1.2 onward require Java 1.7 or newer. Hive versions 0.14 to 1.1 work with Java 1.6 as well.
Hadoop 2.x (preferred), 1.x (not supported by Hive 2.0.0 onward). Hive versions up to 0.13 also supported Hadoop 0.20.x, 0.23.x.
Hive Installation on Ubuntu 14.04 With Pre Built Derby Database
Hive Installation Steps
Step 1 - Creating hive directory. Open a new terminal(CTRL + ALT + T) and enter the following command.
$ sudo mkdir /usr/local/hive
Step 2 - Change the ownership and permissions of the directory /usr/local/hive. Here 'hduser' is an Ubuntu username.
$ sudo chown -R hduser /usr/local/hive
$ sudo chmod -R 755 /usr/local/hive
Step 3 - Switch User, is used by a computer user to execute commands with the privileges of another user account.
$ su hduser
Step 4 - Change the directory to /home/hduser/Desktop , In my case the downloaded apache-hive-2.1.0-bin.tar.gz file is in /home/hduser/Desktop folder. For you it might be in /downloads folder check it.
$ cd /home/hduser/Desktop/
Step 5 - Untar the apache-hive-2.1.0-bin.tar.gz file.
$ tar xzf apache-hive-2.1.0-bin.tar.gz
Step 6 - Move the contents of apache-hive-2.1.0-bin folder to /usr/local/hive
$ mv apache-hive-2.1.0-bin/* /usr/local/hive
Step 7 - Edit $HOME/.bashrc file by adding the pig path.
$ sudo gedit $HOME/.bashrc
$HOME/.bashrc file. Add the following lines
export HIVE_HOME=/usr/local/hive
export PATH=$HIVE_HOME/bin:$HIVE_HOME/lib:$PATH
Step 8 - Reload your changed $HOME/.bashrc settings
$ source $HOME/.bashrc
Step 9 - Change the directory to /usr/local/hive/conf
$ cd $HIVE_HOME/conf
Step 10 - Copy the default hive-env.sh.template to hive-env.sh
$ cp hive-env.sh.template hive-env.sh
Step 11 - Edit hive-env.sh file.
$ gedit hive-env.sh
Step 12 - Add the below lines to hive-env.sh file. Save and Close.
export HADOOP_HOME=/usr/local/hadoop
export HIVE_CONF_DIR=$HIVE_CONF_DIR
export HIVE_AUX_JARS_PATH=$HIVE_AUX_JARS_PATH
Step 13 - Copy the default hive-default.xml.template to hive-site.xml
$ cp hive-default.xml.template hive-site.xml
Step 14 - Edit hive-site.xml file.
$ gedit hive-site.xml
Step 15 - Add or update below properties in hive-site.xml file. Save and Close.
 <property>
  <name>hive.metastore.schema.verification</name>
    <value>false</value>
   <description>Will remove your error occurring because of metastore_db in shark</description>
  </property>
  <property>
    <name>hive.exec.scratchdir</name>
    <value>/tmp/hive</value>
    <description>HDFS root scratch dir for Hive jobs which gets created with write all (733) permission.</description>
  </property>
  <property>
    <name>hive.exec.local.scratchdir</name>
    <value>/tmp/$ {user.name}</value>
    <description>Local scratch space for Hive jobs</description>
  </property>
  <property>
    <name>hive.downloaded.resources.dir</name>
    <value>/tmp/$ {user.name}_resources</value>
    <description>Temporary local directory for added resources in the remote file system.</description>
  </property>
  <property>
    <name>hive.scratch.dir.permission</name>
    <value>733</value>
    <description>The permission for the user specific scratch directories that get created.</description>
  </property>
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:derby://localhost:1527/metastore_db;create=true </value>
<description>JDBC connect string for a JDBC metastore </description>
</property>
Step 16 - Change the directory to /usr/local/hadoop/sbin
$ cd /usr/local/hadoop/sbin
Step 17 - Start all hadoop daemons.
$ start-all.sh
Step 18 - You must use below HDFS commands to create /tmp and /user/hive/warehouse (aka hive.metastore.warehouse.dir) and set them chmod g+w before you can create a table in Hive.
$ hdfs dfs -mkdir /tmp
$ hdfs dfs -chmod 777 /tmp 
$ hdfs dfs -mkdir /user/hive/warehouse
$ hdfs dfs -chmod g+w /tmp
$ hdfs dfs -chmod g+w /user/hive/warehouse
Step 19 - Change the directory to /usr/local/hive/bin
$ cd $HIVE_HOME/bin
Step 20 - We need to run the schematool command below as an initialization step. For example, we can use "derby" as db type.
$ schematool -initSchema -dbType derby
Hive Installation on Ubuntu 14.04 With Pre Built Derby Database
Step 21 - To use the Hive command line interface (CLI) from the shell.
$ ./hive
Step 22 - To list all the tables those are present in derby database.
$ show tables;

Comments

Popular posts from this blog

Apache Spark WordCount scala example

Apache Spark is an open source cluster computing framework. Originally developed at the University of California, Berkeley's AMPLab, the Spark codebase was later donated to the Apache Software Foundation, which has maintained it since. Spark provides an interface for programming entire clusters with implicit data parallelism and fault-tolerance. Pre Requirements 1) A machine with Ubuntu 14.04 LTS operating system 2) Apache Hadoop 2.6.4 pre installed ( How to install Hadoop on Ubuntu 14.04 ) 3) Apache Spark 1.6.1 pre installed ( How to install Spark on Ubuntu 14.04 ) Spark WordCount Scala Example Step 1 - Change the directory to /usr/local/spark/sbin. $ cd /usr/local/spark/sbin Step 2 - Start all spark daemons. $ ./start-all. sh Step 3 - The JPS (Java Virtual Machine Process Status Tool) tool is limited to reporting information on JVMs for which it has the access permissions. $ jp...

Hive hiveserver2 and Web UI usage

Hive hiveserver2 and Web UI usage HiveServer2 (HS2) is a server interface that enables remote clients to execute queries against Hive and retrieve the results (a more detailed intro here). The current implementation, based on Thrift RPC, is an improved version of HiveServer and supports multi-client concurrency and authentication. It is designed to provide better support for open API clients like JDBC and ODBC. Step 1 - Change the directory to /usr/local/hive/bin $ cd $HIVE_HOME/bin Step 2 - Start hiveserver2 daemon $ hiveserver2 OR $ hive --service hiveserver2 & Step 3 - You can browse to hiveserver2 web ui at following url http: //localhost:10002/hiveserver2.jsp Step 4 - You can see the hive logs in /tmp/hduser/hive. log To kill hiveserver2 daemon $ ps -ef | grep -i hiveserver2 $ kill - 9 29707 OR $ rm -rf /var/run/hive/hive...

Apache Spark Shell Usage

Apache Spark is an open source cluster computing framework. Originally developed at the University of California, Berkeley's AMPLab, the Spark codebase was later donated to the Apache Software Foundation, which has maintained it since. Spark provides an interface for programming entire clusters with implicit data parallelism and fault-tolerance. Pre Requirements 1) A machine with Ubuntu 14.04 LTS operating system 2) Apache Hadoop 2.6.4 pre installed ( How to install Hadoop on Ubuntu 14.04 ) 3) Apache Spark 1.6.1 pre installed ( How to install Spark on Ubuntu 14.04 ) Spark Shell Usage The Spark shell provides an easy and convenient way to prototype certain operations quickly, without having to develop a full program, packaging it and then deploying it. Step 1 - Change the directory to /usr/local/hadoop/sbin. $ cd /usr/local/hadoop/sbin Step 2 - Start all hadoop daemons. $ ./start-all. sh ...