Hadoop Commissioning and Decommissioning data node

Commissioning new DataNode to existing Hadoop Cluster
Given below are the steps to be followed for adding new nodes to a Hadoop cluster.
Hadoop Commissioning and Decommissioning DataNode

On All machines - (HadoopMaster, HadoopSlave1, HadoopSlave2, HadoopSlave3)
Step 1 - Edit /etc/hosts file.

$ sudo gedit /etc/hosts

/etc/hosts file. Add all machines IP address and hostname. Save and close.

192.168.2.14    HadoopMaster
192.168.2.15    HadoopSlave1
192.168.2.16    HadoopSlave2
192.168.2.17    HadoopSlave3

Only on new machine - (HadoopSlave3)
Step 2 - Update. Open a terminal (CTRL + ALT + T) and type the following sudo command. It is advisable to run this before installing any package, and necessary to run it to install the latest updates, even if you have not added or removed any Software Sources.

$ sudo apt-get update

Step 3 - Installing Java 7.

$ sudo apt-get install openjdk-7-jdk

Step 4 - Install open-ssh server. It is a cryptographic network protocol for operating network services securely over an unsecured network. The best known example application is for remote login to computer systems by users.

$ sudo apt-get install openssh-server

Step 5 - Create a Group. We will create a group, configure the group sudo permissions and then add the user to the group. Here 'hadoop' is a group name and 'hduser' is a user of the group.

$ sudo addgroup hadoop

$ sudo adduser --ingroup hadoop hduser

Step 6 - Configure the sudo permissions for 'hduser'.

$ sudo visudo

Since by default ubuntu text editor is nano we will need to use CTRL + O to edit.

ctrl+O

Add the permissions to sudoers.

hduser ALL=(ALL) ALL

Use CTRL + X keyboard shortcut to exit out. Enter Y to save the file.

ctrl+x

Step 7 - Creating hadoop directory.

$ sudo mkdir /usr/local/hadoop

Step 8 - Change the ownership and permissions of the directory /usr/local/hadoop. Here 'hduser' is an Ubuntu username.

$ sudo chown -R hduser /usr/local/hadoop

$ sudo chmod -R 755 /usr/local/hadoop

Step 9 - Creating /app/hadoop/tmp directory.

$ sudo mkdir /app/hadoop/tmp

Step 10 - Change the ownership and permissions of the directory /app/hadoop/tmp. Here 'hduser' is an Ubuntu username.

$ sudo chown -R hduser /app/hadoop/tmp

$ sudo chmod -R 755 /app/hadoop/tmp

Step 11 - Switch User, is used by a computer user to execute commands with the privileges of another user account.

$ su hduser

Step 12 - Generating a new SSH public and private key pair on your local computer is the first step towards authenticating with a remote server without a password. Unless there is a good reason not to, you should always authenticate using SSH keys.

$ ssh-keygen -t rsa -P ""

Step 13 - Now you can add the public key to the authorized_keys

$ cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys

Step 14 - Adding hostname to list of known hosts. A quick way of making sure that 'localhost' is added to the list of known hosts so that a script execution doesn't get interrupted by a question about trusting localhost's authenticity.

$ ssh hostname

Only on HadoopMaster Machine

Step 15 - Switch User, is used by a computer user to execute commands with the privileges of another user account.

$ su hduser

Step 16 - ssh-copy-id is a small script which copy your ssh public-key to a remote host; appending it to your remote authorized_keys.

$ ssh-copy-id -i $HOME/.ssh/id_rsa.pub hduser@192.168.2.17

Step 17 - ssh is a program for logging into a remote machine and for executing commands on a remote machine. Check remote login works or not.

$ ssh 192.168.2.17

Step 18 - Exit from remote login.

$ exit

Step 19 - Change the directory to /usr/local/hadoop/etc/hadoop

$ cd $HADOOP_HOME/etc/hadoop

Step 20 - Edit slaves file.

$ sudo gedit slaves

Step 21 - Add the below line to slaves file. Save and Close.

192.168.2.15
192.168.2.16
192.168.2.17

Step 22 - Secure copy or SCP is a means of securely transferring computer files between a local host and a remote host or between two remote hosts. Here we are transferring configured hadoop files from master to slave nodes.

$ scp -r /usr/local/hadoop/* hduser@192.168.2.17:/usr/local/hadoop

$ scp -r $HADOOP_HOME/etc/hadoop/slaves hduser@192.168.2.15:/usr/local/hadoop/etc/hadoop
$ scp -r $HADOOP_HOME/etc/hadoop/slaves hduser@192.168.2.16:/usr/local/hadoop/etc/hadoop

Step 23 - Here we are transferring configured .bashrc file from master to slave nodes.

$ scp -r $HOME/.bashrc hduser@192.168.2.17:$HOME/.bashrc

Only on new machine - (HadoopSlave3)
Step 24 - Change the directory to /usr/local/hadoop

$ cd /usr/local/hadoop

Step 25 - Start datanode daemon

$ /sbin/hadoop-daemon.sh start datanode

Step 26 - Start NodeManager daemon

$ /sbin/yarn-daemon.sh start nodemanager

Step 27 - The JPS (Java Virtual Machine Process Status Tool) tool is limited to reporting information on JVMs for which it has the access permissions.

$ jps

Decommissioning existing DataNode from Hadoop Cluster
We can remove a node from a cluster on the fly, while it is running, without any data loss. HDFS provides a decommissioning feature, which ensures that removing a node is performed safely. To use it, follow the steps as given below:
Hadoop Commissioning and Decommissioning DataNode

Only on HadoopMaster Machine

Step 1 - Change the directory to /usr/local/hadoop/etc/hadoop

$ cd $HADOOP_HOME/etc/hadoop

Step 2 - Edit hdfs-site.xml file.

$ sudo gedit hdfs-site.xml

Step 3 - Add the below lines to hdfs-site.xml file. Save and Close.

<property>
<name>dfs.hosts.exclude</name>
<value>/usr/local/hadoop/hdfs_exclude.txt</value>
<description>DFS exclude</description>
</property>

Step 4 - Change the directory to /usr/local/hadoop

$ cd $HADOOP_HOME

Step 5 - Create hdfs_exclude.txt file and open for editing

$ gedit hdfs_exclude.txt

Step 6 - Add the following line to hdfs_exclude.txt file. Save and close.

192.168.2.17

Step 7 - Change the directory to /usr/local/hadoop/sbin

$ cd $HADOOP_HOME/sbin

Step 8 - Refresh all nodes.

$ hadoop dfsadmin -refreshNodes

Only on new machine - (HadoopSlave3)
Step 9 - Check if NodeManager is still running by jps. If it is still running stop it.

$ jps

Step 10 - Change the directory to /usr/local/hadoop/sbin

$ cd $HADOOP_HOME/sbin

Step 11 - Stop NodeManager daemon.

$ yarn-daemon.sh stop nodemanager

After the decommission process has been completed, the decommissioned hardware can be safely shut down for maintenance. Run the report command to dfsadmin to check the status of decommission. The following command will describe the status of the decommission node and the connected nodes to the cluster.
Only on HadoopMaster Machine

$ /sbin/hadoop dfsadmin -report

Apache Spark WordCount scala example

Apache Spark is an open source cluster computing framework. Originally developed at the University of California, Berkeley's AMPLab, the Spark codebase was later donated to the Apache Software Foundation, which has maintained it since. Spark provides an interface for programming entire clusters with implicit data parallelism and fault-tolerance. Pre Requirements 1) A machine with Ubuntu 14.04 LTS operating system 2) Apache Hadoop 2.6.4 pre installed ( How to install Hadoop on Ubuntu 14.04 ) 3) Apache Spark 1.6.1 pre installed ( How to install Spark on Ubuntu 14.04 ) Spark WordCount Scala Example Step 1 - Change the directory to /usr/local/spark/sbin. $ cd /usr/local/spark/sbin Step 2 - Start all spark daemons. $ ./start-all. sh Step 3 - The JPS (Java Virtual Machine Process Status Tool) tool is limited to reporting information on JVMs for which it has the access permissions. $ jp...

Big Data Analysis

Search This Blog

Hadoop Commissioning and Decommissioning data node

Comments

Post a Comment

Popular posts from this blog

Apache Spark WordCount scala example

Hive hiveserver2 and Web UI usage

Apache Spark Shell Usage