Table of Contents

Driven Administrator Guide

version 2.0.5

Troubleshooting Driven

This Troubleshooting section is a list of tips for solving common issues that users encounter while deploying and running Driven. If you still have problems after reading this page, feel free to post your queries to Driven Forums or email us at support@concurrentinc.com.

Tip
See the Version Requirements page for interdependencies among Driven, Cascading, JDK, and other components that you need for the deployment.

The Driven Server installation generates a Java incompatibility error

Ensure that you have Java 7 installed on your system. Driven Server installation requires Java 7.

Cascading applications cannot send data to the Driven Server

This happens when your client application (Cascading application) cannot communicate to the Driven Server.

  • Verify that your Driven application is up by logging in to the web browser interface.

  • Ensure that the Driven URL location is reachable from your Hadoop cluster. The configured location is stored in the $HADOOP_CONF/cascading-service.properties file.

Driven Plugin runs out of memory

  • Make sure that you are on the latest version of Driven.

  • Ensure that you are running your application with adequate memory for the JVM. The Driven Plugin stores an internal queue of rich metadata until it is sent to the Driven application. Client environment variables, such as HADOOP_CLIENT, YARN_CLIENT_OPTS, and YARN_HEAPSIZE, should be set to the following commonly used values at a minimum. If sufficient memory is available on your system, you might benefit from allocating more memory to the JVM.

    -Xmx2G -Xms2G -XX:MaxPermSize=256m -XX:+UseParNewGC -XX:+UseConcMarkSweepGC
    -XX:+CMSPermGenSweepingEnabled -XX:+CMSClassUnloadingEnabled

Driven Server crashes with an OOM exception

  • Verify that the following java configuration parameters are set for your Tomcat server:

    -XX:+UseParNewGC -XX:+UseConcMarkSweepGC
    -XX:+CMSPermGenSweepingEnabled -XX:+CMSClassUnloadingEnabled
  • Allocate at least 4 GB for memory.

Elasticsearch crashes with an OOM exception

Verify that the field index size is not set to unlimited.

Driven Plugin is making the application slow

There can be many reasons why a Cascading application is running slow, such as the Driven Plugin is taking extra time to collect the slice data information from the Hadoop NameNode, the lack of memory resources, or lost connectivity to the Driven Server.

  • Historically, the Driven Support team has observed that frequently customer sites underprovision Hadoop NameNodes. The Driven Plugin collects job execution data associated with each slice from the NameNode.

  • The cluster has lost connectivity with the Driven Server.

  • The Cascading application is complex with large volumes of data running on a large cluster (these parameters influence the scale of data being collected and transmitted). In this case, you can reduce the events that the Driven Plugin is polling in the Hadoop NameNode and reduce the volume of telemetry data that is being sent to the Driven Server.

To begin, you can suppress transmission of the slice data by setting the following property in the cascading.properties file:

driven.protocol.slice.suppress=true

Note that the process of Driven Plugin collecting and sending the telemetry data is decoupled from the actual execution of your application on the cluster. While your client application is gated on completion of both the application processing and the Driven Plugin transmission, no SLAs are compromised as a result of the additional latency that may be introduced due to data transmission.

An unresponsive application stops sending telemetry data but Driven still displays app status as RUNNING

Sometimes Driven suddenly stops receiving telemetry data from an application that hangs during execution. In this case, Driven might continue to display that the app execution instance is in RUNNING status. If an application is unresponsive, it is important that the application status in Driven change from RUNNING to STOPPED. If you do not mark the application instance as stopped to reflect that telemetry data is not being received, Driven might display misleading information about the app.

Step 1: Display all applications in RUNNING status

If your Status View in Driven is not filtered appropriately, set the search Status filter menu to Running.

Step 2: Open app details about the unresponsive application

In the Details Table, click the Name link for the unresponsive application. The app details page appears, showing the application’s units of work and overall status.

Step 3: Stop the unresponsive application

After you confirm that this is the specific application you want to stop, click Mark as stopped.

Note
The stopped application generates a timestamp that reflects when the Driven Server last received an update timestamp from the Driven Plugin. For example, if you mark an application that began to hang three days ago as stopped, Driven indicates the stopped time as three days ago if that is the timestamp of the last update from the Driven Plugin.

This is the last topic in the Driven Administrator Guide. You can return to the Table of Contents.