Driven Administrator Guide
version 2.0.5Installing and Configuring the Driven Plugin
Overview
To enable Driven, you must install and configure the Driven Plugin for your Hadoop environment so that telemetry data can be collected and sent to the Driven Server. Unlike the online Driven Plugin that sends the telemetry data to the driven.io service, the plugin for an on-site Driven deployment must be configured to direct data to the correct location.
If are participating in the driven.io online trial program, use the Driven Plugin setup procedures in the online trial Quick Start Guide instead of the following steps.
Prerequisites
-
The Driven Server must be installed and configured before you can set up the Driven Plugin.
-
Your data applications must have network access to your installation of the Driven Server.
-
Client environment variables, such as HADOOP_CLIENT, YARN_CLIENT_OPTS, and YARN_HEAPSIZE, should be set to the following commonly used values at a minimum. If sufficient memory is available on your system, you might want to allocate more memory to the JVM.
-Xmx2G -Xms2G -XX:MaxPermSize=256m -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSPermGenSweepingEnabled -XX:+CMSClassUnloadingEnabled
Installing the Plugin
The following steps document how to obtain and configure the Driven Plugin. See the documentation for Amazon Elastic MapReduce or Maven and Gradle if you use those environments. If you plan to package the Driven Plugin directly in your application, you can add the Maven spec to your project file as a dependency. Adding the plugin to your Maven repositories is particularly useful if you plan to use Driven from your IDE as you develop a Cascading application.
-
On UNIX, Linux, and Mac: Run the following command:
$ wget -i http://files.concurrentinc.com/driven/2.0/driven-plugin/latest-jar.txt
-
On Windows: Download the latest JAR file: driven-plugin-2.0.5.jar
The JAR file includes all dependencies.
Configuring the Plugin
Step 1: Make the plugin accessible to your application
Create or update the cascading-service.properties file and ensure it is in your HADOOP_CLASSPATH. The easiest way to do this is to place it in your Hadoop configuration directory ($HADOOP_CONF). For Hadoop 1.x, this might be $HADOOP_HOME/conf. For Hadoop 2.x, the directory might be $HADOOP_INSTALL/etc/hadoop.
$ echo cascading.management.service.jar=${PATH}/driven-plugin-2.0.5.jar >> ${HADOOP_CONF}/cascading-service.properties
Alternatively, set the path to the plugin by adding it to the HADOOP_CLASSPATH environment variable:
$ export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:${PATH}/driven-plugin-2.0.5.jar
Step 2: Set the Driven Server host URL
Specify the URL for the Driven Server in the Cascading service properties file in the following command format:
$ echo cascading.management.document.service.hosts=${DRIVEN_SERVER_URL} >> ${HADOOP_CONF}/cascading-service.properties
Alternatively, set the host in an environment variable. Example:
$ export DRIVEN_SERVER_HOSTS=${DRIVEN_SERVER_URL}
Step 3 (Optional): Using a Driven API Key
Each Driven Team has a unique API key. By associating an application with an API key, you make the application searchable by other members of your team.
-
Log in to the Driven Server to open the Driven web interface.
-
Hover over your user name in the upper-right corner of the window.
-
Click My Teams.
-
Record an API key for the team of your choice.
-
Configure the Driven Plugin with the Driven API key by one of the following methods:
Store the API key parameter in the cascading-service properties file. Example:
$ echo cascading.management.document.service.apikey=${API_KEY} >> ${HADOOP_CONF}/cascading-service.properties
Alternatively, you can set the API key in an environment variable. Example:
$ export DRIVEN_API_KEY=${API_KEY}
Additional Configuration Settings
The Driven Plugin has additional configuration options that generally relate to administrative tasks. For more information, see Driven Plugin Configuration Parameters.