Pseudo Distributed Mode (Single Node Cluster)
The Hadoop daemons run on a local machine, thus simulating a cluster on a small scale. Different Hadoop daemons run in different JVM instances, but on a single machine. HDFS is used instead of local FS.
Step 1 - Update. Open a terminal (CTRL + ALT + T) and type the following sudo command. It is advisable to run this before installing any package, and necessary to run it to install the latest updates, even if you have not added or removed any Software Sources.
Step 2 - Installing Java 7.
Step 3 - Install open-ssh server.
It is a cryptographic network protocol for operating network services
securely over an unsecured network. The best known example application
is for remote login to computer systems by users.
Step 4 - Create a Group. We will create a group, configure the group sudo permissions and then add the user to the group. Here 'hadoop' is a group name and 'hduser' is a user of the group.
Step 5 - Configure the sudo permissions for 'hduser'.
Since by default ubuntu text editor is nano we will need to use CTRL + O to edit.
Add the permissions to sudoers.
Use CTRL + X keyboard shortcut to exit out. Enter Y to save the file.
Step 6 - Creating hadoop directory.
Step 7 - Change the ownership and permissions of the directory /usr/local/hadoop. Here 'hduser' is an Ubuntu username.
Step 8 - Switch User, is used by a computer user to execute commands with the privileges of another user account.
Step 9 - Change the directory to /home/hduser/Desktop , In my case the downloaded hadoop-2.6.4.tar.gz file is in /home/hduser/Desktop folder. For you it might be in /downloads folder check it.
Step 10 - Untar the hadoop-2.6.4.tar.gz file.
Step 11 - Move the contents of hadoop-2.6.4 folder to /usr/local/hadoop
Step 12 - Edit $HOME/.bashrc file by adding the java and hadoop path.
$HOME/.bashrc file. Add the following lines
Step 13 - Reload your changed $HOME/.bashrc settings
Step 14 - Generating a new SSH public and private key pair
on your local computer is the first step towards authenticating with a
remote server without a password. Unless there is a good reason not to,
you should always authenticate using SSH keys.
Step 15 - Now you can add the public key to the authorized_keys
Step 16 - Adding localhost to list of known hosts.
A quick way of making sure that 'localhost' is added to the list of
known hosts so that a script execution doesn't get interrupted by a
question about trusting localhost's authenticity.
Step 17 - Change the directory to /usr/local/hadoop/etc/hadoop
Step 18 - Edit hadoop-env.sh file.
Step 19 - Add the below lines to hadoop-env.sh file. Save and Close.
Step 20 - Edit core-site.xml file.
Step 21 - Add the below lines to core-site.xml file. Save and Close.
Step 22 - Edit hdfs-site.xml file.
Step 23 - Add the below lines to hdfs-site.xml file. Save and Close.
Step 24 - Edit yarn-site.xml file.
Step 25 - Add the below lines to yarn-site.xml file. Save and Close.
Step 26 - Copy the default mapred-site.xml.template to mapred-site.xml
Step 27 - Edit mapred-site.xml file.
Step 28 - Add the below lines to mapred-site.xml file. Save and Close.
Step 29 - Edit slaves file.
Step 30 - Add the below line to slaves file. Save and Close.
Step 31 - Creating /app/hadoop/tmp directory.
Step 32 - Change the ownership and permissions of the directory /app/hadoop/tmp. Here 'hduser' is an Ubuntu username.
Step 33 - Change the directory to /usr/local/hadoop/sbin
Step 34 - Format the datanode.
Step 35 - Start NameNode daemon and DataNode daemon.
Step 36 - Start yarn daemons.
OR
Instead of steps 35 and 36 you can use below command. It is deprecated now.
Step 37 - The JPS (Java Virtual Machine Process Status Tool) tool is limited to reporting information on JVMs for which it has the access permissions.
Step 39 - Copy the input files into the distributed filesystem.
Step 40 - Run some of the examples provided.
Step 41 - Examine the output files.
Step 42 - Stop NameNode daemon and DataNode daemon.
Step 43 - Stop Yarn daemons.
OR
Instead of steps 42 and 43 you can use below command. It is deprecated now.
The Hadoop daemons run on a local machine, thus simulating a cluster on a small scale. Different Hadoop daemons run in different JVM instances, but on a single machine. HDFS is used instead of local FS.
Step 1 - Update. Open a terminal (CTRL + ALT + T) and type the following sudo command. It is advisable to run this before installing any package, and necessary to run it to install the latest updates, even if you have not added or removed any Software Sources.
$ sudo apt-get update
$ sudo apt-get install openjdk-7-jdk
$ sudo apt-get install openssh-server
$ sudo addgroup hadoop
$ sudo adduser --ingroup hadoop hduser
$ sudo visudo
ctrl+O
hduser ALL=(ALL) ALL
ctrl+x
$ sudo mkdir /usr/local/hadoop
$ sudo chown -R hduser /usr/local/hadoop
$ sudo chmod -R 755 /usr/local/hadoop
$ su hduser
$ cd /home/hduser/Desktop/
$ tar xzf hadoop-2.6.4.tar.gz
$ mv hadoop-2.6.4/* /usr/local/hadoop
$ sudo gedit $HOME/.bashrc
# Set Hadoop-related environment variables export HADOOP_HOME=/usr/local/hadoop export HADOOP_MAPRED_HOME=$HADOOP_HOME export HADOOP_COMMON_HOME=$HADOOP_HOME export HADOOP_HDFS_HOME=$HADOOP_HOME export YARN_HOME=$HADOOP_HOME export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin export HADOOP_INSTALL=$HADOOP_HOME export HADOOP_OPTS="$HADOOP_OPTS -Djava.library.path=/usr/local/hadoop/lib/native" # Set JAVA_HOME (we will also configure JAVA_HOME directly for Hadoop later on) export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64
$ source $HOME/.bashrc
$ ssh-keygen -t rsa -P ""
$ cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys
$ ssh localhost
$ cd $HADOOP_HOME/etc/hadoop
$ sudo gedit hadoop-env.sh
# remove comment and change java_HOME export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64
$ sudo gedit core-site.xml
<property> <name>fs.default.name</name> <value>hdfs://localhost:9000</value> <description>The name of the default file system. A URI whose scheme and authority determine the FileSystem implementation. The uri's scheme determines the config property (fs.SCHEME.impl) naming the FileSystem implementation class. The uri's authority is used to determine the host, port, etc. for a filesystem.</description> </property>
$ sudo gedit hdfs-site.xml
<property> <name>dfs.replication</name> <value>1</value> </property> <property> <name>dfs.name.dir</name> <value>/app/hadoop/tmp/namenode</value> </property> <property> <name>dfs.data.dir</name> <value>/app/hadoop/tmp/datanode</value> </property>
$ sudo gedit yarn-site.xml
<property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property>
$ cp mapred-site.xml.template mapred-site.xml
$ sudo gedit mapred-site.xml
<property> <name>mapreduce.framework.name</name> <value>yarn</value> </property>
$ sudo gedit slaves
localhost
$ sudo mkdir /app/hadoop/tmp
$ sudo chown -R hduser /app/hadoop/tmp
$ sudo chmod -R 755 /app/hadoop/tmp
$ cd /usr/local/hadoop/sbin
$ hadoop namenode -format
$ start-dfs.sh
$ start-yarn.sh
Instead of steps 35 and 36 you can use below command. It is deprecated now.
$ start-all.sh
$ jps
Step 38 - Make the HDFS directories required to execute MapReduce jobs.$ bin/hdfs dfs -mkdir /user
$ bin/hdfs dfs -mkdir /user/hduser
$ hdfs dfs -put /usr/local/hadoop/etc/hadoop /user/hduser/input
$ hadoop jar /usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.4.jar grep /user/hduser/input /user/hduser/output 'dfs[a-z.]+'
$ hdfs dfs -cat /user/hduser/output/*
$ stop-dfs.sh
$ stop-yarn.sh
Instead of steps 42 and 43 you can use below command. It is deprecated now.
$ stop-all.sh
Comments
Post a Comment