Apache Spark is an open source cluster computing framework.
Originally developed at the University of California, Berkeley's AMPLab,
the Spark codebase was later donated to the Apache Software Foundation,
which has maintained it since. Spark provides an interface for
programming entire clusters with implicit data parallelism and
fault-tolerance.
Pre Requirements
1) A machine with Ubuntu 14.04 LTS operating system
2) Apache Hadoop 2.6.4 pre installed (How to install Hadoop on Ubuntu 14.04)
3) Apache Spark 1.6.1 pre installed (How to install Spark on Ubuntu 14.04)
Spark WordCount Scala Example
Step 1 - Change the directory to /usr/local/spark/sbin.
Step 2 - Start all spark daemons.
Step 3 - The JPS (Java Virtual Machine Process Status Tool) tool is limited to reporting information on JVMs for which it has the access permissions.
SparkWordCount.scala
Step 4 - Create a jar file.
Step 5 - Run application.
Please share this blog post and follow me for latest updates on
Pre Requirements
1) A machine with Ubuntu 14.04 LTS operating system
2) Apache Hadoop 2.6.4 pre installed (How to install Hadoop on Ubuntu 14.04)
3) Apache Spark 1.6.1 pre installed (How to install Spark on Ubuntu 14.04)
Spark WordCount Scala Example
Step 1 - Change the directory to /usr/local/spark/sbin.
$ cd /usr/local/spark/sbin
$ ./start-all.sh
$ jps
SparkWordCount.scala
import org.apache.spark.SparkContext import org.apache.spark.SparkContext._ import org.apache.spark._ object SparkWordCount { def main(args: Array[String]) { val sc = new SparkContext("spark://127.0.0.1:7077", "Word Count", "/usr/local/spark", Nil, Map(), Map()) /* local = master URL; Word Count = application name; */ /* /usr/local/spark = Spark Home; Nil = jars; Map = environment */ /* Map = variables to work nodes */ /*creating an inputRDD to read text file (in.txt) through Spark context*/ val inputfile = sc.textFile("/user/hduser/in.txt") /* Transform the inputRDD into countRDD */ val counts = inputfile.flatMap(line => line.split(" ")).map(word => (word, 1)).reduceByKey(_+_); /* saveAsTextFile method is an action that effects on the RDD */ counts.saveAsTextFile("/user/hduser/outfile") System.out.println("OK"); } }
$ jar -cvf /home/hduser/Desktop/1.6\ SPARK/SparkWordCountScala.jar SparkWordCount*.class /usr/local/spark/lib/spark-core_2.10-0.9.0-incubating.jar /usr/local/spark/lib/spark-assembly-1.6.1-hadoop2.6.0.jar
$ spark-submit --class SparkWordCount --master yarn --deploy-mode cluster --executor-cores 1 --num-executors 1 /home/hduser/Desktop/1.6\ SPARK/SparkWordCountScala.jar
Comments
Post a Comment