Table of Contents

Using Driven Agent with Apache Oozie

version 1.3.8

Using Driven Agent with Apache Oozie

Hive Agent Bundle Configuration

Apache Oozie is a popular workflow management solution in the Hadoop ecosystem. Oozie allows running a variety of different technologies as so called actions in workflows. One such action is the HiveAction, which allows running Hive queries without a hive server within a workflow. Oozie uses a client-server architecture, however the Oozie server is not running any user code by itself. Instead it uses a so called LauncherMapper to drive each action in a given workflow. This means, that any node of the cluster can potentially be the machine that drives a given Hive query. This means that every machine must have access to the Driven Agent for Hive and every machine must be able to talk to the Driven Server. Please make sure to adapt your firewall settings accordingly.

Instead of installing the Driven Agent for Hive on every machine of the cluster, the Driven Agent for Hive can be installed in Oozie’s sharelib on HDFS:

Given a sharelib directory on HDFS of /user/oozie/share/lib/lib_20150721160609 the installation of the Driven Agent for Hive would work as follows:

> hadoop fs -mkdir /user/oozie/share/lib/lib_20150721160609/driven
> hadoop fs -copyFromLocal /path/to/driven-agent-hive-bundle-<version>.jar /user/oozie/share/lib/lib_20150721160609/driven$

Some distributions require a restart of the Oozie server after modifying the sharelib. Please check the documentation of your distribution.

Now that the Driven Agent for Hive is available on HDFS, the Agent must be configured on the global workflow or single action XML.

This property sets the Java path for loading the agent. The jar file name must match that on HDFS:

            <property>
                <name>oozie.launcher.mapred.child.java.opts</name>
                <value>-javaagent:$PWD/driven-agent-hive-bundle-<version>.jar</value>
            </property>

This property configures the Oozie Hive Action to include jars from the hive and driven sub-directories of the currently active sharelib.

            <property>
                <name>oozie.action.sharelib.for.hive</name>
                <value>hive,driven</value>
            </property>

These properties configure the Driven Server location and API key for the plugin to use. Depending on your deployment and needs, you can freely chose on which level to set these properties. Setting the properties on the workflow level will enable the Agent for all HiveActions in that workflow. Setting them on the action level will only enable them for that Hive Action.

            <property>
                <name>cascading.management.document.service.hosts</name>
                <value>http://<hostname>:<port>/</value>
            </property>
            <property>
                <name>cascading.management.document.service.apikey</name>
                <value><apikey></value>
            </property>

Map Reduce Bundle Configuration