Skip to main content

How to install Hive on VMware

HIVE Installation

Step 1: The JAVA and Hadoop must be preinstalled on your system.
Step 2: For Hive Download Hive from http://apache.petsads.us/hive/hive-0.14.0/.
It gets downloaded in /user/download folder.
Check for the files
  1. $ cd /usr/download   
If the download is successful you will find the below file by typing ls command.
  1. $ ls  
  2. apache-hive-0.14.0-bin.tar.gz  
Unzip it
  1. $ tar zxvf apache-hive-0.14.0-bin.tar.gz  
Copy File
Copy the file to /usr/local/hive directory with root user
  1. $ su -  
  2. passwd:  
  3. Copying files to /usr/local/hive directory  
  4. $ mv /user/download/apache-hive-0.14.0-bin /usr/local/hive  
Environment for Hive
Add the below line in ./bashrc file
  1. export HIVE_HOME=/usr/local/hive</br>  
  2. export PATH=$PATH:$HIVE_HOME/bin</br>  
  3. export CLASSPATH=$CLASSPATH:/usr/local/Hadoop/lib/*:.</br>  
  4. export CLASSPATH=$CLASSPATH:/usr/local/hive/lib/*:.</br></br>  
Now run the ./bashrc file to reflect those changes.
  1. $ source ~/.bashrc</br>  
Configuring Hive
For Configuring Hive hive-env.sh is edited. This file is present in HIVE_HOME/conf.
  1. $ cd $HIVE_HOME/conf  
  2. $ cp hive-env.sh.template hive-env.sh  
Add the below line to hive-env.sh.
  1. export HADOOP_HOME=/usr/local/hadoop  
Step 3:Derby Database
Hive uses external Database server to configure Metastore.
Now Download and install Apache Derby
Follow the steps given below to download and install Apache Derby.
Downloading Apache Derby:
The following command is used to download Apache Derby. It takes some time to download.
  1. $ cd ~  
  2. $ wget http://archive.apache.org/dist/db/derby/db-derby-10.4.2.0/db-derby-10.4.2.0-bin.tar.gz</br>  
Unzip it:
  1. $tar zxvf db-derby-10.4.2.0-bin.tar.gz  
Copy File:
Move file to /usr/local/derby directory
  1. $mv /user/download/db-derby-10.4.2.0-bin /usr/local/derby</br>  
Set up the enironment
Add the below line to ./bashrc file
  1. export DERBY_HOME=/usr/local/derby  
  2. export PATH=$PATH:$DERBY_HOME/bin  
  3. Apache Hive  
  4. 18  
  5. export CLASSPATH=$CLASSPATH:$DERBY_HOME/lib/derby.jar:$DERBY_HOME/lib/derbytools.jar  
To reflect the changes type
  1. $ source ~/.bashrc  
Create a directory to store Metastore
Create a directory named data in $DERBY_HOME directory to store Metastore data.
  1. $ mkdir $DERBY_HOME/data</br>  
Step 4:Configuring Metastore of Hive
Edit hive-site.xml and append the following lines between the <configuration> and </configuration> tags:
  1. <property>  
  2. <name>javax.jdo.option.ConnectionURL</name>  
  3. <value>jdbc:derby://localhost:1527/metastore_db;create=true </value>  
  4. <description>JDBC connect string for a JDBC metastore</description>  
  5. </property>  
Create a file named jpox.properties and add the following lines into it:
  1. javax.jdo.PersistenceManagerFactoryClass =  
  2. org.jpox.PersistenceManagerFactoryImpl  
  3. org.jpox.autoCreateSchema = false  
  4. org.jpox.validateTables = false  
  5. org.jpox.validateColumns = false  
  6. org.jpox.validateConstraints = false  
  7. org.jpox.storeManagerType = rdbms  
  8. org.jpox.autoCreateSchema = true  
  9. org.jpox.autoStartMechanismMode = checked  
  10. org.jpox.transactionIsolation = read_committed  
  11. javax.jdo.option.DetachAllOnCommit = true  
  12. javax.jdo.option.NontransactionalRead = true  
  13. javax.jdo.option.ConnectionDriverName = org.apache.derby.jdbc.ClientDriver  
  14. javax.jdo.option.ConnectionURL = jdbc:derby://hadoop1:1527/metastore_db;create = true  
  15. javax.jdo.option.ConnectionUserName = APP  
  16. javax.jdo.option.ConnectionPassword = mine  
Step 5:Verify Hive Installation
Create the /tmp folder and a separate Hive folder in HDFS. Here, we use the /user/hive/warehouse folder. You need to set write permission for these newly created folders as shown below:
  1. chmodg+w  
Set them in HDFS using the following commands:
  1. $ $HADOOP_HOME/bin/hadoop fs -mkdir /tmp  
  2. $ $HADOOP_HOME/bin/hadoop fs -mkdir /user/hive/warehouse  
  3. $ $HADOOP_HOME/bin/hadoop fs -chmodg+w /tmp  
  4. $ $HADOOP_HOME/bin/hadoop fs -chmodg+w /user/hive/warehouse  
The following commands are used to verify Hive installation:
  1. $ cd $HIVE_HOME  
  2. $ bin/hive  
  3. hive  

Comments

Popular posts from this blog

Apache Spark WordCount scala example

Apache Spark is an open source cluster computing framework. Originally developed at the University of California, Berkeley's AMPLab, the Spark codebase was later donated to the Apache Software Foundation, which has maintained it since. Spark provides an interface for programming entire clusters with implicit data parallelism and fault-tolerance. Pre Requirements 1) A machine with Ubuntu 14.04 LTS operating system 2) Apache Hadoop 2.6.4 pre installed ( How to install Hadoop on Ubuntu 14.04 ) 3) Apache Spark 1.6.1 pre installed ( How to install Spark on Ubuntu 14.04 ) Spark WordCount Scala Example Step 1 - Change the directory to /usr/local/spark/sbin. $ cd /usr/local/spark/sbin Step 2 - Start all spark daemons. $ ./start-all. sh Step 3 - The JPS (Java Virtual Machine Process Status Tool) tool is limited to reporting information on JVMs for which it has the access permissions. $ jp...

Hive hiveserver2 and Web UI usage

Hive hiveserver2 and Web UI usage HiveServer2 (HS2) is a server interface that enables remote clients to execute queries against Hive and retrieve the results (a more detailed intro here). The current implementation, based on Thrift RPC, is an improved version of HiveServer and supports multi-client concurrency and authentication. It is designed to provide better support for open API clients like JDBC and ODBC. Step 1 - Change the directory to /usr/local/hive/bin $ cd $HIVE_HOME/bin Step 2 - Start hiveserver2 daemon $ hiveserver2 OR $ hive --service hiveserver2 & Step 3 - You can browse to hiveserver2 web ui at following url http: //localhost:10002/hiveserver2.jsp Step 4 - You can see the hive logs in /tmp/hduser/hive. log To kill hiveserver2 daemon $ ps -ef | grep -i hiveserver2 $ kill - 9 29707 OR $ rm -rf /var/run/hive/hive...

Apache Spark Shell Usage

Apache Spark is an open source cluster computing framework. Originally developed at the University of California, Berkeley's AMPLab, the Spark codebase was later donated to the Apache Software Foundation, which has maintained it since. Spark provides an interface for programming entire clusters with implicit data parallelism and fault-tolerance. Pre Requirements 1) A machine with Ubuntu 14.04 LTS operating system 2) Apache Hadoop 2.6.4 pre installed ( How to install Hadoop on Ubuntu 14.04 ) 3) Apache Spark 1.6.1 pre installed ( How to install Spark on Ubuntu 14.04 ) Spark Shell Usage The Spark shell provides an easy and convenient way to prototype certain operations quickly, without having to develop a full program, packaging it and then deploying it. Step 1 - Change the directory to /usr/local/hadoop/sbin. $ cd /usr/local/hadoop/sbin Step 2 - Start all hadoop daemons. $ ./start-all. sh ...