Skip to main content

HBase Fully Distributed Mode Installation on Ubuntu 14.04

Apache HBase is an open source, non-relational, distributed database modeled after Google's BigTable and is written in Java. It is developed as part of Apache Software Foundation's Apache Hadoop project and runs on top of HDFS (Hadoop Distributed File System), providing BigTable-like capabilities for Hadoop. That is, it provides a fault-tolerant way of storing large quantities of sparse data (small amounts of information caught within a large collection of empty or unimportant data, such as finding the 50 largest items in a group of 2 billion records, or finding the non-zero items representing less than 0.1% of a huge collection).
Pre Requirements
1) Machines with Ubuntu 14.04 LTS operating system.
2) Apache Hadoop pre installed (How to install Hadoop on Ubuntu 14.04)
3) Apache HBase 1.2.3 Software (Download Here)
HBase Fully Distributed Mode Installation on Ubuntu
For running a fully-distributed operation on more than one host, make the following configurations. (see, Pseudo Distributed Mode Setup)
HBase Fully Distributed Mode Installation on Ubuntu 14.04
HBase Installation Steps
On All machines - (masterhbase, regionserver1, regionserver2)
Step 1 - Update. Open a terminal (CTRL + ALT + T) and type the following sudo command. It is advisable to run this before installing any package, and necessary to run it to install the latest updates, even if you have not added or removed any Software Sources.

$ sudo apt-get update
Step 2 - Installing Java 7.

$ sudo apt-get install openjdk-7-jdk
Step 3 - Install open-ssh server. It is a cryptographic network protocol for operating network services securely over an unsecured network. The best known example application is for remote login to computer systems by users.

$ sudo apt-get install openssh-server
Step 4 - Edit /etc/hosts file.
$ sudo gedit /etc/hosts
/etc/hosts file. Add all machines IP address and hostname. Save and close.
10.10.10.1 masterhbase
10.10.10.2 regionserver1
10.10.10.3 regionserver2
Step 5 - Creating /usr/local/hbase directory.
$ sudo mkdir /usr/local/hbase
Step 6 - Change the ownership and permissions of the directory /usr/local/hbase. Here 'hduser' is an Ubuntu username.
$ sudo chown -R hduser /usr/local/hbase
$ sudo chmod -R 755 /usr/local/hbase
Step 7 - Creating /var/hbase/pids directory.
$ sudo mkdir /var/hbase/pids
Step 8 - Change the ownership and permissions of the directory /var/hbase/pids. Here 'hduser' is an Ubuntu username.
$ sudo chown -R hduser /var/hbase/pids
$ sudo chmod -R 755 /var/hbase/pids
Step 9 - Generating a new SSH public and private key pair on your local computer is the first step towards authenticating with a remote server without a password. Unless there is a good reason not to, you should always authenticate using SSH keys.
$ ssh-keygen -t rsa -P ""
Step 10 - Now you can add the public key to the authorized_keys
$ cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys
Step 11 - Adding hostname to list of known hosts. A quick way of making sure that 'hostname' is added to the list of known hosts so that a script execution doesn't get interrupted by a question about trusting computer's authenticity.
$ ssh hostname 
Only on masterhbase Machine

Step 16 - ssh-copy-id is a small script which copy your ssh public-key to a remote host; appending it to your remote authorized_keys.
$ ssh-copy-id -i $HOME/.ssh/id_rsa.pub hduser@10.10.10.2
Step 17 - ssh is a program for logging into a remote machine and for executing commands on a remote machine. Check remote login works or not.
$ ssh 10.10.10.2
Step 18 - Exit from remote login.
$ exit 
Same steps 16, 17 and 18 for other machines (regionserver2).
$ ssh-copy-id -i $HOME/.ssh/id_rsa.pub hduser@10.10.10.3
$ ssh 10.10.10.3
$ exit 
Step 19 - Change the directory to /home/hduser/Desktop , In my case the downloaded hbase-1.2.3-bin.tar.gz file is in /home/hduser/Desktop folder. For you it might be in /downloads folder check it.
$ cd /home/hduser/Desktop/
Step 20 - Untar the hbase-1.2.3-bin.tar.gz file.
$ tar xzf hbase-1.2.3-bin.tar.gz
Step 21 - Move the contents of hbase-1.2.3 folder to /usr/local/hbase
$ mv hbase-1.2.3/* /usr/local/hbase
Step 22 - Edit $HOME/.bashrc file by adding the java and hadoop path.
$ sudo gedit $HOME/.bashrc
$HOME/.bashrc file. Add the following lines
export HBASE_HOME=/usr/local/hbase
PATH=$PATH:$HBASE_HOME/bin
export CLASSPATH=$CLASSPATH:/usr/local/hbase/lib/*
Step 23 - Reload your changed $HOME/.bashrc settings
$ source $HOME/.bashrc
Step 24 - Change the directory to /usr/local/hbase/conf
$ cd /usr/local/hbase/conf
Step 25 - Edit hbase-env.sh file.
$ gedit hbase-env.sh
Step 26 - Add the below lines to hadoop-env.sh file. Save and Close.
export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64
export HBASE_REGIONSERVERS=/conf/regionservers
export HBASE_MANAGES_ZK=true
export HBASE_PID_DIR=/var/hbase/pids
Step 27 - Edit regionservers file.
$ gedit regionservers
Step 28 - Add the below lines to regionservers file. Save and Close.
10.10.10.2 
10.10.10.3 
Step 29 - Edit backup-masters file.
$ gedit backup-masters
Step 30 - Add the below lines to backup-masters file. Save and Close.
10.10.10.2
Step 31 - Make a new /user/hduser/hbase directory in HDFS.
$ hdfs dfs -mkdir /user/hduser/hbase
Step 32 - Make a new /user/hduser/zookeeper directory in HDFS.
$ hdfs dfs -mkdir /user/hduser/zookeeper
Step 33 - Edit hbase-site.xml file.
$ gedit hbase-site.xml
Add the below lines to hbase-site.xml file. Save and Close.
<property>
<name>hbase.rootdir</name>
<value>hdfs://namenode:9000/user/hduser/hbase</value>
</property>

<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
</property>

<property>
<name>hbase.zookeeper.quorum</name>
<value>10.10.10.1,10.10.10.2,10.10.10.3</value>
</property>

<property> 
<name>hbase.master</name> 
<value>10.10.10.1:60010</value> 
</property>

<property>
<name>hbase.zookeeper.property.clientPort</name>
<value>2181</value>
</property>

<property>
<name>hbase.zookeeper.property.dataDir</name>
<value>hdfs://namenode:9000/user/hduser/zookeeper</value>
</property>
Step 34 - Secure copy or SCP is a means of securely transferring computer files between a local host and a remote host or between two remote hosts. Here we are transferring configured hadoop files from master to slave nodes.
$ scp -r /usr/local/hbase/* hduser@10.10.10.2:/usr/local/hbase
$ scp -r /usr/local/hbase/* hduser@10.10.10.3:/usr/local/hbase
Step 35 - Here we are transferring configured .bashrc file from master to slave nodes.
$ scp -r $HOME/.bashrc hduser@10.10.10.2:$HOME/.bashrc
$ scp -r $HOME/.bashrc hduser@10.10.10.2:$HOME/.bashrc
Step 36 - Change the directory to /usr/local/hbase/bin
$ cd /usr/local/hbase/bin
Step 37 - Start all hbase daemons.
$ ./start-hbase.sh
Step 38 - The JPS (Java Virtual Machine Process Status Tool) tool is limited to reporting information on JVMs for which it has the access permissions.
$ jps
20355 Jps
20071 HQuorumPeer
20137 HMaster
Once the HBase is up and running check the web-ui of the components as described below
First master
http://masterhbase:16010
Secondary master
http://regionserver1:16010
Region server1
http://regionserver1:16030
Region server2
http://regionserver2:16030
Only on regionserver1 machine

$ jps
15930 HRegionServer
16194 Jps
15838 HQuorumPeer
16010 HMaster
Only on regionserver2 machine

$ jps
13901 Jps
13639 HQuorumPeer
13737 HRegionServer
Only on hbasemaster machine

$ cd /usr/local/hbase/bin
Step 39 - Stop all hbase daemons.
$ ./stop-hbase.sh

Comments

Popular posts from this blog

Apache Spark WordCount scala example

Apache Spark is an open source cluster computing framework. Originally developed at the University of California, Berkeley's AMPLab, the Spark codebase was later donated to the Apache Software Foundation, which has maintained it since. Spark provides an interface for programming entire clusters with implicit data parallelism and fault-tolerance. Pre Requirements 1) A machine with Ubuntu 14.04 LTS operating system 2) Apache Hadoop 2.6.4 pre installed ( How to install Hadoop on Ubuntu 14.04 ) 3) Apache Spark 1.6.1 pre installed ( How to install Spark on Ubuntu 14.04 ) Spark WordCount Scala Example Step 1 - Change the directory to /usr/local/spark/sbin. $ cd /usr/local/spark/sbin Step 2 - Start all spark daemons. $ ./start-all. sh Step 3 - The JPS (Java Virtual Machine Process Status Tool) tool is limited to reporting information on JVMs for which it has the access permissions. $ jp...

Hive hiveserver2 and Web UI usage

Hive hiveserver2 and Web UI usage HiveServer2 (HS2) is a server interface that enables remote clients to execute queries against Hive and retrieve the results (a more detailed intro here). The current implementation, based on Thrift RPC, is an improved version of HiveServer and supports multi-client concurrency and authentication. It is designed to provide better support for open API clients like JDBC and ODBC. Step 1 - Change the directory to /usr/local/hive/bin $ cd $HIVE_HOME/bin Step 2 - Start hiveserver2 daemon $ hiveserver2 OR $ hive --service hiveserver2 & Step 3 - You can browse to hiveserver2 web ui at following url http: //localhost:10002/hiveserver2.jsp Step 4 - You can see the hive logs in /tmp/hduser/hive. log To kill hiveserver2 daemon $ ps -ef | grep -i hiveserver2 $ kill - 9 29707 OR $ rm -rf /var/run/hive/hive...

Apache Spark Shell Usage

Apache Spark is an open source cluster computing framework. Originally developed at the University of California, Berkeley's AMPLab, the Spark codebase was later donated to the Apache Software Foundation, which has maintained it since. Spark provides an interface for programming entire clusters with implicit data parallelism and fault-tolerance. Pre Requirements 1) A machine with Ubuntu 14.04 LTS operating system 2) Apache Hadoop 2.6.4 pre installed ( How to install Hadoop on Ubuntu 14.04 ) 3) Apache Spark 1.6.1 pre installed ( How to install Spark on Ubuntu 14.04 ) Spark Shell Usage The Spark shell provides an easy and convenient way to prototype certain operations quickly, without having to develop a full program, packaging it and then deploying it. Step 1 - Change the directory to /usr/local/hadoop/sbin. $ cd /usr/local/hadoop/sbin Step 2 - Start all hadoop daemons. $ ./start-all. sh ...