Spark spark-submit Script Usage

Apache Spark is an open source cluster computing framework. Originally developed at the University of California, Berkeley's AMPLab, the Spark codebase was later donated to the Apache Software Foundation, which has maintained it since. Spark provides an interface for programming entire clusters with implicit data parallelism and fault-tolerance.
Pre Requirements
1) A machine with Ubuntu 14.04 LTS operating system
2) Apache Hadoop 2.6.4 pre installed (How to install Hadoop on Ubuntu 14.04)
3) Apache Spark 1.6.1 pre installed (How to install Spark on Ubuntu 14.04)
Spark spark-submit script
The spark-submit script in Sparkâ€™s bin directory is used to launch applications on a cluster. It can use all of Sparkâ€™s supported cluster managers through a uniform interface so you donâ€™t have to configure your application specially for each one.

./bin/spark-submit \
  --class <main-class> \
  --master <master-url> \
  --deploy-mode <deploy-mode> \
  --conf <key>=<value> \
  ... # other options
  <application-jar> \
  [application-arguments]

Some of the commonly used options are:

--class: The entry point for your application (e.g. org.apache.spark.examples.SparkPi)
--master: The master URL for the cluster (e.g. spark://23.195.26.187:7077)
--deploy-mode: Whether to deploy your driver on the worker nodes (cluster) or locally as an external client (client) (default: client) â€ 
--conf: Arbitrary Spark configuration property in key=value format. For values that contain spaces wrap â€œkey=valueâ€ in quotes (as shown).
application-jar: Path to a bundled jar including your application and all dependencies. The URL must be globally visible inside of your cluster, for instance, an hdfs:// path or a file:// path that is present on all nodes.
application-arguments: Arguments passed to the main method of your main class, if any

Execution on Standalone and Cluster mode
1) Run application locally on 8 cores.

./bin/spark-submit \
  --class org.apache.spark.examples.SparkPi \
  --master local[8] \
  /path/to/examples.jar \
  100

2) Run on a Spark standalone cluster in client deploy mode.

./bin/spark-submit \
  --class org.apache.spark.examples.SparkPi \
  --master spark://207.184.161.138:7077 \
  --executor-memory 20G \
  --total-executor-cores 100 \
  /path/to/examples.jar \
  1000

3) Run on a Spark standalone cluster in cluster deploy mode with supervise

./bin/spark-submit \
  --class org.apache.spark.examples.SparkPi \
  --master spark://207.184.161.138:7077 \
  --deploy-mode cluster \
  --supervise \
  --executor-memory 20G \
  --total-executor-cores 100 \
  /path/to/examples.jar \
  1000

4) Run a Python application on a Spark standalone cluster

./bin/spark-submit \
  --master spark://207.184.161.138:7077 \
  examples/src/main/python/pi.py \
  1000

Execution on YARN
1) Run on a YARN cluster in cluster deploy mode

./bin/spark-submit \
  --class org.apache.spark.examples.SparkPi \
  --master yarn \
  --deploy-mode cluster \  # can be client for client mode
  --executor-memory 20G \
  --num-executors 50 \
  /path/to/examples.jar \
  1000

2) Run on a YARN cluster in client deploy mode

./bin/spark-submit \
  --class org.apache.spark.examples.SparkPi \
  --master yarn \
  --deploy-mode client \  
  --executor-memory 20G \
  --num-executors 50 \
  /path/to/examples.jar \
  1000

Execution on Mesos
1) Run on a Mesos cluster in cluster deploy mode with supervise

./bin/spark-submit \
  --class org.apache.spark.examples.SparkPi \
  --master mesos://207.184.161.138:7077 \
  --deploy-mode cluster \
  --supervise \
  --executor-memory 20G \
  --total-executor-cores 100 \
  http://path/to/examples.jar \
  1000

Big Data Analysis

Search This Blog

Spark spark-submit Script Usage

Comments

Post a Comment

Popular posts from this blog

Apache Spark WordCount scala example

Hive hiveserver2 and Web UI usage

Apache Spark Shell Usage