Table of Contents

Driven Administrator Guide

version 1.3.8

Troubleshooting Driven

This Troubleshooting section is a list of tips for solving common issues that users encounter while deploying and running Driven. If you still have problems after reading this page, feel free to post your queries to Driven Forums or email us at support@concurrentinc.com.

Tip
See the Driven Compatibility Matrix section of the Planning a Driven Deployment page for version compatibilities among Driven, Cascading, JDK, and other components that you need for the deployment.

The Driven Server installation generates a Java incompatibility error

Ensure that you have Java 7 installed on your system. Driven Server installation requires Java 7.

Cascading applications cannot send data to the Driven Server

This happens when your client application (Cascading application) cannot communicate to the Driven Server.

  • Verify that your Driven application is up by logging in to the web browser interface.

  • Ensure that the Driven URL location is reachable from your Hadoop cluster. The configured location is stored in the $HADOOP_CONF/cascading-se rvice.properties file.

Driven Plugin runs out of memory

  • Make sure that you are on the latest version of Driven.

  • Ensure that you are running your Cascading application with adequate memory for the JVM. The Driven Plugin stores an internal queue of rich metadata until it is sent to the Driven application. Client environment variables, such as HADOOP_CLIENT, YARN_CLIENT_OPTS, and YARN_HEAPSIZE, should be set to the following commonly used values at a minimum. If sufficient memory is available on your system, you might benefit from allocating more memory to the JVM.

    -Xmx=2G -Xms=2G -XX:MaxPermSize=256m -XX:+UseParNewGC -XX:+UseConcMarkSweepGC
    -XX:+CMSPermGenSweepingEnabled -XX:+CMSClassUnloadingEnabled

Driven Plugin is making the application slow

There can be many reasons why a Cascading application is running slow, such as the Driven Plugin is taking extra time to collect the slice data information from the Hadoop NameNode, the lack of memory resources, or lost connectivity to the Driven Server.

  • Historically, the Driven Support team has observed that frequently customer sites underprovision Hadoop NameNodes. The Driven Plugin collects job execution data associated with each slice from the NameNode.

  • The cluster has lost connectivity with the Driven Server.

  • The Cascading application is complex with large volumes of data running on a large cluster (these parameters influence the scale of data being collected and transmitted). In this case, you can reduce the events that the Driven Plugin is polling in the Hadoop NameNode and reduce the volume of telemetry data that is being sent to the Driven Server.

To begin, you can suppress transmission of the slice data by setting the following property in the cascading.properties file:

driven.protocol.slice.suppress=true

Note that the process of Driven Plugin collecting and sending the telemetry data is decoupled from the actual execution of your application on the cluster. While your client application is gated on completion of both the application processing and the Driven Plugin transmission, no SLAs are compromised as a result of the additional latency that may be introduced due to data transmission.

Transmission of Cascading application metrics data to Driven stops but app status remains in RUNNING state

Sometimes an application fails or a functioning Driven deployment no longer receives telemetry data from a running application. If application updates to Driven cease when this is not your intention, it is important that the application status in Driven change from RUNNING to STOPPED. Unresponsive applications can distort the accuracy of the Status Timeline graph if the status remains in RUNNING state.

Step 1: Display all applications in RUNNING status

If your Status View in Driven is not filtered appropriately, set the search Status filter menu to Running.

Step 2: Open details about the unresponsive application

Click the Name link for the unresponsive application. The performance view will appear, showing the application’s flow and status state. Figure 1 shows an example of the performance view.

zombie_markAsStopped_Status

Figure 1: Example of performance view window with warning in top right corner

The application’s condition is displayed in the top right corner of the screenshot in Figure 1. The warning here indicates that the application is unresponsive, and the Driven Server is not receiving updates from the Driven Plugin.

Step 3: Stop the unresponsive application

After you confirm that this is the specific application you want to stop, click Mark as stopped.

zombie_markAsStopped

Figure 2: The Mark as stopped button and application status

Note
The application has a stop timestamp at the time when the Driven Server last received an update timestamp from the Driven Plugin that monitors application processes in the Hadoop infrastructure. For example, if you stopped an application at the current (system) time, Driven will mark the stop time at the last updated timestamp that might have occurred three months previously.

This is the last topic in the Driven Administrator Guide. You can return to the Table of Contents.