Hive WordCount hiveQL Execution

Hive WordCount hiveQL Example
Step 1 - Change the directory to /usr/local/hadoop/sbin

$ cd /usr/local/hadoop/sbin

Step 2 - Start all hadoop daemons.

$ start-all.sh

Step 3 - Create employee.txt file.

employee.txt

Step 4 - Add these following lines to employee.txt file. Save and close.

1201 Gopal 45000 TechnicalManager TP
1202 Manisha 45000 ProofReader PR
1203 Masthanvali 40000 TechnicalWriter TP
1204 Krian 40000 HrAdmin HR
1205 Kranthi 30000 OpAdmin Admin

Step 5 - Copy employee.txt from local file system into HDFS.

$ hdfs dfs -copyFromLocal /home/hduser/Desktop/employee.txt /user/hduser/employee123.txt

Step 6 - Change the directory to /usr/local/hive/bin

$ cd $HIVE_HOME/bin

Step 7 - Create wordcount hive query file. The file should have .hql extension.

wordcount.hql

Step 8 - Add thses following lines to wordcount.hql Save and close.

CREATE TABLE docs (line STRING);
LOAD DATA INPATH 'hdfs://localhost:9000/user/hduser/employee123.txt' OVERWRITE INTO TABLE docs;
CREATE TABLE word_counts AS
SELECT word, count(1) AS count FROM
(SELECT explode(split(line, '\\s')) AS word FROM docs) w
GROUP BY word
ORDER BY word;

Step 9 - Execute wordcount.hql hiveQL

$ hive -f /home/hduser/Desktop/HIVE/wordcount.hql

Step 10 - Execute select hiveQL

$ hive -e 'select * from word_counts'

Set these Hive Execution Parameters in hive-site.xml

  <property>
     <name>mapred.reduce.tasks</name>
     <value>-1</value>
     <description>The default number of reduce tasks per job.</description>
  </property>

  <property>
     <name>hive.exec.scratchdir</name>
     <value>/tmp/mydir</value>
     <description>Scratch space for Hive jobs</description>
  </property>

  <property>
     <name>hive.metastore.warehouse.dir</name>
     <value>/user/hive/warehouse</value>
     <description>location of default database for the warehouse</description>
  </property>

  <property>
     <name>hive.enforce.bucketing</name>
     <value>true</value>
     <description>Whether bucketing is enforced. If true, while inserting into the table, bucketing is enforced. </description>
  </property>

Big Data Analysis

Search This Blog

Hive WordCount hiveQL Execution

Comments

Post a Comment

Popular posts from this blog

Apache Spark WordCount scala example

Hive hiveserver2 and Web UI usage

Apache Spark Shell Usage