Table of Contents

Driven Administrator Guide

version 1.3.8

Installing and Configuring the Driven Plugin

Overview

To enable Driven, you must install and configure the Driven Plugin for your Hadoop environment so that telemetry data can be collected and sent to the Driven Server. Unlike the hosted Driven Plugin that sends the telemetry data to the driven.io service, the plugin for a self-hosted Driven deployment must be configured to direct data to the correct location.

For information about Driven Plugin setup in environments that use the driven.io service as part of the hosted program, see Getting Started with Hosted Driven.

Prerequisites

  • The Driven Server must be installed and configured before you can set up the Driven Plugin.

  • Your data applications must have network access to your installation of the Driven Server.

  • Client environment variables, such as HADOOP_CLIENT, YARN_CLIENT_OPTS, and YARN_HEAPSIZE, should be set to the following commonly used values at a minimum. If sufficient memory is available on your system, you might want to allocate more memory to the JVM.

    -Xmx=2G -Xms=2G -XX:MaxPermSize=256m -XX:+UseParNewGC -XX:+UseConcMarkSweepGC
    -XX:+CMSPermGenSweepingEnabled -XX:+CMSClassUnloadingEnabled

Installing the Plugin

The following steps document how to obtain and configure the Driven Plugin. See the documentation for Amazon Elastic MapReduce or Maven and Gradle if you use those environments. If you plan to package the Driven Plugin directly in your application, you can add the Maven spec to your project file as a dependency. Adding the plugin to your Maven repositories is particularly useful if you plan to use Driven from your IDE as you develop a Cascading application.

  • On UNIX, Linux, and Mac: Run the following command:

$ wget -i http://files.concurrentinc.com/driven/1.3/driven-plugin/latest-jar.txt

The JAR file includes all dependencies.

Configuring the Plugin

Step 1: Make the plugin accessible to your application

Create or update the cascading-service.properties file and ensure it is in your HADOOP_CLASSPATH. The easiest way to do this is to place it in your Hadoop configuration directory ($HADOOP_CONF). For Hadoop 1.x, this might be $HADOOP_HOME/conf. For Hadoop 2.x, the directory might be $HADOOP_INSTALL/etc/hadoop.

$ echo cascading.management.service.jar=${PATH}/driven-plugin-1.3.8.jar >> ${HADOOP_CONF}/cascading-service.properties

Alternatively, set the path to the plugin by adding it to the HADOOP_CLASSPATH environment variable:

$ export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:${PATH}/driven-plugin-1.3.8.jar

Step 2: Set the Driven Server host URL

Specify the URL for the Driven Server in the Cascading service properties file in the following command format:

$ echo cascading.management.document.service.hosts=${DRIVEN_SERVER_URL} >> ${HADOOP_CONF}/cascading-service.properties

Alternatively, set the host in an environment variable. Example:

$ export DRIVEN_SERVER_HOSTS=${DRIVEN_SERVER_URL}

Step 3 (Optional): Using a Driven API Key

Each Driven Team has a unique API key. By associating an application with an API key, you make the application searchable by other members of your team.

  1. Log in to the Driven Server to open the Driven web interface.

  2. Hover over your user name in the upper-right corner of the window.

  3. Click My Teams.

  4. Record an API key for the team of your choice.

  5. Configure the Driven Plugin with the Driven API key by one of the following methods:

Store the API key parameter in the cascading-service properties file. Example:

$ echo cascading.management.document.service.apikey=${API_KEY} >> ${HADOOP_CONF}/cascading-service.properties

Alternatively, you can set the API key in an environment variable. Example:

$ export DRIVEN_API_KEY=${API_KEY}

Additional Configuration Settings

The Driven Plugin has additional configuration options that generally relate to administrative tasks. For more information, see Driven Plugin Configuration Parameters.

Next