Skip to main content

HBase Pseudo Distributed Mode Installation on Ubuntu 14.04

Apache HBase is an open source, non-relational, distributed database modeled after Google's BigTable and is written in Java. It is developed as part of Apache Software Foundation's Apache Hadoop project and runs on top of HDFS (Hadoop Distributed File System), providing BigTable-like capabilities for Hadoop. That is, it provides a fault-tolerant way of storing large quantities of sparse data (small amounts of information caught within a large collection of empty or unimportant data, such as finding the 50 largest items in a group of 2 billion records, or finding the non-zero items representing less than 0.1% of a huge collection).
Pre Requirements
1) A machine with Ubuntu 14.04 LTS operating system.
2) Apache Hadoop pre installed (How to install Hadoop on Ubuntu 14.04)
3) Apache HBase 1.2.3 Software (Download Here)
HBase Pseudo Distributed Mode Installation
Before proceeding with HBase, configure Hadoop and HDFS on your local system or on a remote system and make sure they are running.
A pseudo-distributed mode is simply a distributed mode run on a single host. Use this configuration testing and prototyping on HBase. Do not use this configuration for production nor for evaluating HBase performance.
HBase Pseudo Distributed Mode Installation on Ubuntu 14.04
HBase Installation Steps
Step 1 - Installing java 7. Open a terminal (CTRL + ALT + T) and type the following sudo command.

$ sudo apt-get install openjdk-7-jdk
Verify Installation
$ java -version
Step 2 - Edit /etc/hosts file.
$ sudo gedit /etc/hosts
/etc/hosts file. Add machine IP address and hostname. Save and close.
127.0.0.1 localhost
127.0.0.1 praveen
Step 3 - Creating /usr/local/hbase directory.
$ sudo mkdir /usr/local/hbase
Step 4 - Change the ownership and permissions of the directory /usr/local/hbase. Here 'hduser' is an Ubuntu username.
$ sudo chown -R hduser /usr/local/hbase
$ sudo chmod -R 755 /usr/local/hbase
Step 5 - Creating /var/hbase/pids directory.
$ sudo mkdir /var/hbase/pids
Step 6 - Change the ownership and permissions of the directory /var/hbase/pids. Here 'hduser' is an Ubuntu username.
$ sudo chown -R hduser /var/hbase/pids
$ sudo chmod -R 755 /var/hbase/pids
Step 7 - Change the directory to /home/hduser/Desktop , In my case the downloaded hbase-1.2.3-bin.tar.gz file is in /home/hduser/Desktop folder. For you it might be in /downloads folder check it.
$ cd /home/hduser/Desktop/
Step 8 - Untar the hbase-1.2.3-bin.tar.gz file.
$ tar xzf hbase-1.2.3-bin.tar.gz
Step 9 - Move the contents of hbase-1.2.3 folder to /usr/local/hbase
$ mv hbase-1.2.3/* /usr/local/hbase
Step 10 - Edit $HOME/.bashrc file by adding the java and hadoop path.
$ sudo gedit $HOME/.bashrc
$HOME/.bashrc file. Add the following lines
export HBASE_HOME=/usr/local/hbase
PATH=$PATH:$HBASE_HOME/bin
export CLASSPATH=$CLASSPATH:/usr/local/hbase/lib/*
Step 11 - Reload your changed $HOME/.bashrc settings
$ source $HOME/.bashrc
Step 12 - Change the directory to /usr/local/hbase/conf
$ cd /usr/local/hbase/conf
Step 13 - Edit hbase-env.sh file.
$ gedit hbase-env.sh
Step 14 - Add the below lines to hadoop-env.sh file. Save and Close.
export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64
export HBASE_REGIONSERVERS=/conf/regionservers
export HBASE_MANAGES_ZK=true
export HBASE_PID_DIR=/var/hbase/pids
Step 15 - Edit hbase-site.xml file.
$ gedit hbase-site.xml
Add the below lines to hbase-site.xml file. Save and Close.
<property>
<name>hbase.rootdir</name>
<value>hdfs://localhost:9000/user/hduser/hbase</value>
</property>

<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
</property>

<property>
<name>hbase.zookeeper.quorum</name>
<value>localhost</value>
</property>

<property> 
<name>hbase.master</name> 
<value>localhost:60010</value> 
</property>

<property>
<name>hbase.zookeeper.property.clientPort</name>
<value>2181</value>
</property>

<property>
<name>hbase.zookeeper.property.dataDir</name>
<value>hdfs://localhost:9000/user/hduser/zookeeper</value>
</property>
Step 16 - Make a new /user/hduser/hbase directory in HDFS.
$ hdfs dfs -mkdir /user/hduser/hbase
Step 17 - Make a new /user/hduser/zookeeper directory in HDFS.
$ hdfs dfs -mkdir /user/hduser/zookeeper
Step 18 - Change the directory to /usr/local/hbase/bin
$ cd /usr/local/hbase/bin
Step 19 - Start all hbase daemons.
$ ./start-hbase.sh
Step 20 - The JPS (Java Virtual Machine Process Status Tool) tool is limited to reporting information on JVMs for which it has the access permissions.
$ jps
Once the HBase is up and running check the web-ui of the components as described below
http://localhost:16010
Step 21 - Change the directory to /usr/local/hbase
$ cd /usr/local/hbase
Step 22 - HBase creates its directory in HDFS. To see the created directory, browse to Hadoop bin and type the following command.
$ ./bin/hbase dfs -ls /user/hduser/hbase
HBase Pseudo Distributed Mode Installation on Ubuntu 14.04
Step 23 - To start up the initial HBase cluster.
$ ./start-hbase.sh
HBase Pseudo Distributed Mode Installation on Ubuntu 14.04
Step 24 - To start up an extra backup master(s) on the same server run. the '1' means use ports 60001 & 60011, and this backup master's logfile will be at logs/hbase--1-master-.log. You can start up to 9 backup masters (10 total).
$ ./bin/local-master-backup.sh start 1 2
Step 25 - To start up more regionservers. where '1' means use ports 60201 & 60301 and its logfile will be at logs/hbase--1-regionserver-$ {HOSTNAME}.log. This supports up to 99 extra regionservers (100 total).
$ ./bin/./local-regionservers.sh start 1
To add 4 more regionservers in addition to the one you just started by running.
$ ./bin/./local-regionservers.sh start 2 3 4
To Enter into HBase Shell
$ hbase shell
HBase Pseudo Distributed Mode Installation on Ubuntu 14.04
Step 26 - To stop an individual regionserver
$ ./bin/./local-regionservers.sh stop 1
Step 27 - To stop an individual backup master server
$ ./bin/local-master-backup.sh stop 1
OR
Step 28 - To stop all HBase daemons.
$ ./bin/stop-hbase.sh

Comments

Popular posts from this blog

Apache Spark WordCount scala example

Apache Spark is an open source cluster computing framework. Originally developed at the University of California, Berkeley's AMPLab, the Spark codebase was later donated to the Apache Software Foundation, which has maintained it since. Spark provides an interface for programming entire clusters with implicit data parallelism and fault-tolerance. Pre Requirements 1) A machine with Ubuntu 14.04 LTS operating system 2) Apache Hadoop 2.6.4 pre installed ( How to install Hadoop on Ubuntu 14.04 ) 3) Apache Spark 1.6.1 pre installed ( How to install Spark on Ubuntu 14.04 ) Spark WordCount Scala Example Step 1 - Change the directory to /usr/local/spark/sbin. $ cd /usr/local/spark/sbin Step 2 - Start all spark daemons. $ ./start-all. sh Step 3 - The JPS (Java Virtual Machine Process Status Tool) tool is limited to reporting information on JVMs for which it has the access permissions. $ jp...

Hive hiveserver2 and Web UI usage

Hive hiveserver2 and Web UI usage HiveServer2 (HS2) is a server interface that enables remote clients to execute queries against Hive and retrieve the results (a more detailed intro here). The current implementation, based on Thrift RPC, is an improved version of HiveServer and supports multi-client concurrency and authentication. It is designed to provide better support for open API clients like JDBC and ODBC. Step 1 - Change the directory to /usr/local/hive/bin $ cd $HIVE_HOME/bin Step 2 - Start hiveserver2 daemon $ hiveserver2 OR $ hive --service hiveserver2 & Step 3 - You can browse to hiveserver2 web ui at following url http: //localhost:10002/hiveserver2.jsp Step 4 - You can see the hive logs in /tmp/hduser/hive. log To kill hiveserver2 daemon $ ps -ef | grep -i hiveserver2 $ kill - 9 29707 OR $ rm -rf /var/run/hive/hive...

Apache Spark Shell Usage

Apache Spark is an open source cluster computing framework. Originally developed at the University of California, Berkeley's AMPLab, the Spark codebase was later donated to the Apache Software Foundation, which has maintained it since. Spark provides an interface for programming entire clusters with implicit data parallelism and fault-tolerance. Pre Requirements 1) A machine with Ubuntu 14.04 LTS operating system 2) Apache Hadoop 2.6.4 pre installed ( How to install Hadoop on Ubuntu 14.04 ) 3) Apache Spark 1.6.1 pre installed ( How to install Spark on Ubuntu 14.04 ) Spark Shell Usage The Spark shell provides an easy and convenient way to prototype certain operations quickly, without having to develop a full program, packaging it and then deploying it. Step 1 - Change the directory to /usr/local/hadoop/sbin. $ cd /usr/local/hadoop/sbin Step 2 - Start all hadoop daemons. $ ./start-all. sh ...