Driven Agent Guide: Driven Agent for Apache Spark
version 2.2.6- 1. Prerequisites
- 2. Installing the Driven Agent
- 3. Configuring the Driven Agent
-
3.1. Testing the Agent
3.2. Agent Common Options
- 4. Driven Agent for MapReduce
-
4.1. Running on Hadoop or YARN
4.2. MapReduce Versions
- 5. Driven Agent for Hive
-
5.1. Hive Version Requirements
5.2. Metadata Support
- 6. Driven Agent for Apache Spark
-
6.1. Spark Version Requirements
6.2. Supported APIs
6.3. Spark Runtimes
6.4. Supported Runtimes
- 7. Using Driven Agent with Apache Oozie
- 8. Advanced Installation
- 9. Troubleshooting the Driven Agent
Driven Agent for Apache Spark
The agent for Apache Spark enables Driven to perform real-time monitoring of any application written with the Spark API.
For instructions on downloading and installing the Driven Agent see sections on downloading and installing the agent.
Note
|
The current release is part of our Early Access Program (EAP) releases thus supported Versions, APIs, and Runtimes are subject to change before the final release. |
Spark Version Requirements
The Driven Agent can be used with the following versions of Spark:
Spark Version |
---|
1.6.x |
1.5.x |
1.4.x |
Supported APIs
API | Context | Supported | Comments |
---|---|---|---|
Spark Batch |
SparkContext |
yes |
|
Spark Streaming |
StreamingContext |
yes |
partial test coverage |
DataFrames |
N/A |
no |
planned |
SQL |
SqlContext |
no |
based on DataFrames |
Hive |
HiveContext |
no |
based on DataFrames |
Supported Runtimes
Runtime | Master Param | Supported | Comments |
---|---|---|---|
Hadoop YARN Client |
yarn-client |
yes |
|
Hadoop YARN Cluster |
yarn-cluster |
yes |
|
Spark Standalone |
spark://IP:PORT |
yes |
|
Apache Mesos |
mesos://IP:PORT |
untested |
Apache YARN
The agent can be used with an existing Apache YARN cluster to deploy Spark
applications via the spark-submit
shell script in $SPARK_HOME/bin
.
If using --master yarn-client
switch to submit a Spark application, set the
following switch:
--driver-java-options "-javaagent:/path/to/driven-agent-spark-<version>.jar=drivenHosts=<driven host>;drivenAPIkey=<driven api key>"
For example:
spark-submit \
--master yarn-client \
--num-executors 3 \
--driver-memory 4g \
--executor-memory 2g \
--executor-cores 1 \
--driver-java-options "-javaagent:/path/to/driven-agent-spark-<version>.jar=drivenHosts=<driven host>;drivenAPIkey=<driven api key>" \
--class org.apache.spark.examples.SparkPi \
"${SPARK_HOME}/lib/spark-examples*.jar" 100
The option --master yarn-client
runs the main
function of the Spark
application locally.
If using --master yarn-cluster
switch to submit a Spark application, set the
additional switch to ensure the agent jar is uploaded to the YARN cluster:
--driver-java-options "-javaagent:driven-agent-spark-<version>.ja=drivenHosts=<driven host>;drivenAPIkey=<driven api key>" \
--jars "/path/to/driven-agent-spark-<version>.jar"
For example:
spark-submit \
--master yarn-cluster \
--num-executors 3 \
--driver-memory 4g \
--executor-memory 2g \
--executor-cores 1 \
--driver-java-options "-javaagent:driven-agent-spark-<version>.jar=drivenHosts=<driven host>;drivenAPIkey=<driven api key>" \
--jars "/path/to/driven-agent-spark-<version>.jar" \
--class org.apache.spark.examples.SparkPi \
"${SPARK_HOME}/lib/spark-examples*.jar" 100
Note
|
When using a properties file to configure the
agent, you must also set the --files switch referencing the properties file so
that it is uploaded to the YARN cluster.
|