HBase Pseudo Distributed Mode Installation on Ubuntu 14.04

Apache HBase is an open source, non-relational, distributed database modeled after Google's BigTable and is written in Java. It is developed as part of Apache Software Foundation's Apache Hadoop project and runs on top of HDFS (Hadoop Distributed File System), providing BigTable-like capabilities for Hadoop. That is, it provides a fault-tolerant way of storing large quantities of sparse data (small amounts of information caught within a large collection of empty or unimportant data, such as finding the 50 largest items in a group of 2 billion records, or finding the non-zero items representing less than 0.1% of a huge collection).
Pre Requirements
1) A machine with Ubuntu 14.04 LTS operating system.
2) Apache Hadoop pre installed (How to install Hadoop on Ubuntu 14.04)
3) Apache HBase 1.2.3 Software (Download Here)
HBase Pseudo Distributed Mode Installation
Before proceeding with HBase, configure Hadoop and HDFS on your local system or on a remote system and make sure they are running.
A pseudo-distributed mode is simply a distributed mode run on a single host. Use this configuration testing and prototyping on HBase. Do not use this configuration for production nor for evaluating HBase performance.
HBase Pseudo Distributed Mode Installation on Ubuntu 14.04

HBase Installation Steps
Step 1 - Installing java 7. Open a terminal (CTRL + ALT + T) and type the following sudo command.

$ sudo apt-get install openjdk-7-jdk

Verify Installation

$ java -version

Step 2 - Edit /etc/hosts file.

$ sudo gedit /etc/hosts

/etc/hosts file. Add machine IP address and hostname. Save and close.

127.0.0.1 localhost
127.0.0.1 praveen

Step 3 - Creating /usr/local/hbase directory.

$ sudo mkdir /usr/local/hbase

Step 4 - Change the ownership and permissions of the directory /usr/local/hbase. Here 'hduser' is an Ubuntu username.

$ sudo chown -R hduser /usr/local/hbase

$ sudo chmod -R 755 /usr/local/hbase

Step 5 - Creating /var/hbase/pids directory.

$ sudo mkdir /var/hbase/pids

Step 6 - Change the ownership and permissions of the directory /var/hbase/pids. Here 'hduser' is an Ubuntu username.

$ sudo chown -R hduser /var/hbase/pids

$ sudo chmod -R 755 /var/hbase/pids

Step 7 - Change the directory to /home/hduser/Desktop , In my case the downloaded hbase-1.2.3-bin.tar.gz file is in /home/hduser/Desktop folder. For you it might be in /downloads folder check it.

$ cd /home/hduser/Desktop/

Step 8 - Untar the hbase-1.2.3-bin.tar.gz file.

$ tar xzf hbase-1.2.3-bin.tar.gz

Step 9 - Move the contents of hbase-1.2.3 folder to /usr/local/hbase

$ mv hbase-1.2.3/* /usr/local/hbase

Step 10 - Edit $HOME/.bashrc file by adding the java and hadoop path.

$ sudo gedit $HOME/.bashrc

$HOME/.bashrc file. Add the following lines

export HBASE_HOME=/usr/local/hbase
PATH=$PATH:$HBASE_HOME/bin
export CLASSPATH=$CLASSPATH:/usr/local/hbase/lib/*

Step 11 - Reload your changed $HOME/.bashrc settings

$ source $HOME/.bashrc

Step 12 - Change the directory to /usr/local/hbase/conf

$ cd /usr/local/hbase/conf

Step 13 - Edit hbase-env.sh file.

$ gedit hbase-env.sh

Step 14 - Add the below lines to hadoop-env.sh file. Save and Close.

export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64
export HBASE_REGIONSERVERS=/conf/regionservers
export HBASE_MANAGES_ZK=true
export HBASE_PID_DIR=/var/hbase/pids

Step 15 - Edit hbase-site.xml file.

$ gedit hbase-site.xml

Add the below lines to hbase-site.xml file. Save and Close.

<property>
<name>hbase.rootdir</name>
<value>hdfs://localhost:9000/user/hduser/hbase</value>
</property>

<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
</property>

<property>
<name>hbase.zookeeper.quorum</name>
<value>localhost</value>
</property>

<property> 
<name>hbase.master</name> 
<value>localhost:60010</value> 
</property>

<property>
<name>hbase.zookeeper.property.clientPort</name>
<value>2181</value>
</property>

<property>
<name>hbase.zookeeper.property.dataDir</name>
<value>hdfs://localhost:9000/user/hduser/zookeeper</value>
</property>

Step 16 - Make a new /user/hduser/hbase directory in HDFS.

$ hdfs dfs -mkdir /user/hduser/hbase

Step 17 - Make a new /user/hduser/zookeeper directory in HDFS.

$ hdfs dfs -mkdir /user/hduser/zookeeper

Step 18 - Change the directory to /usr/local/hbase/bin

$ cd /usr/local/hbase/bin

Step 19 - Start all hbase daemons.

$ ./start-hbase.sh

Step 20 - The JPS (Java Virtual Machine Process Status Tool) tool is limited to reporting information on JVMs for which it has the access permissions.

$ jps

Once the HBase is up and running check the web-ui of the components as described below

http://localhost:16010

Step 21 - Change the directory to /usr/local/hbase

$ cd /usr/local/hbase

Step 22 - HBase creates its directory in HDFS. To see the created directory, browse to Hadoop bin and type the following command.

$ ./bin/hbase dfs -ls /user/hduser/hbase

Step 23 - To start up the initial HBase cluster.

$ ./start-hbase.sh

Step 24 - To start up an extra backup master(s) on the same server run. the '1' means use ports 60001 & 60011, and this backup master's logfile will be at logs/hbase--1-master-.log. You can start up to 9 backup masters (10 total).

$ ./bin/local-master-backup.sh start 1 2

Step 25 - To start up more regionservers. where '1' means use ports 60201 & 60301 and its logfile will be at logs/hbase--1-regionserver-$ {HOSTNAME}.log. This supports up to 99 extra regionservers (100 total).

$ ./bin/./local-regionservers.sh start 1

To add 4 more regionservers in addition to the one you just started by running.

$ ./bin/./local-regionservers.sh start 2 3 4

To Enter into HBase Shell

$ hbase shell

Step 26 - To stop an individual regionserver

$ ./bin/./local-regionservers.sh stop 1

Step 27 - To stop an individual backup master server

$ ./bin/local-master-backup.sh stop 1

OR
Step 28 - To stop all HBase daemons.

$ ./bin/stop-hbase.sh

Apache Spark WordCount scala example

Apache Spark is an open source cluster computing framework. Originally developed at the University of California, Berkeley's AMPLab, the Spark codebase was later donated to the Apache Software Foundation, which has maintained it since. Spark provides an interface for programming entire clusters with implicit data parallelism and fault-tolerance. Pre Requirements 1) A machine with Ubuntu 14.04 LTS operating system 2) Apache Hadoop 2.6.4 pre installed ( How to install Hadoop on Ubuntu 14.04 ) 3) Apache Spark 1.6.1 pre installed ( How to install Spark on Ubuntu 14.04 ) Spark WordCount Scala Example Step 1 - Change the directory to /usr/local/spark/sbin. $ cd /usr/local/spark/sbin Step 2 - Start all spark daemons. $ ./start-all. sh Step 3 - The JPS (Java Virtual Machine Process Status Tool) tool is limited to reporting information on JVMs for which it has the access permissions. $ jp...

Big Data Analysis

Search This Blog

HBase Pseudo Distributed Mode Installation on Ubuntu 14.04

Comments

Post a Comment

Popular posts from this blog

Apache Spark WordCount scala example

Hive hiveserver2 and Web UI usage

Apache Spark Shell Usage