Driven User Guide: Using the App Details Page

version 2.2.6

Driven User Guide: Using the App Details Page

1. Overview of Monitored Applications: 1.1. Logging In

1.2. Status Views
2. Searches, Saved Views, and Accessing Relevant Data: 2.1. Starting a Search

2.2. Saving Search Queries as Views

2.3. My Teams Views

2.4. Case Examples of Searches, Views, and Teams

2.5. Customizing Searches

2.6. Periodic Views

2.7. Counter Data and Other Metrics in Tables

2.8. Sharing and Limiting Access to Application Information
3. Using the App Details Page: 3.1. Searching App Details

3.2. Viewing Application Details

3.3. Understanding the Directed Acyclic Graph

3.4. Viewing the Graph

3.5. Real-Time Visibility into Your Application

3.6. Details Table
4. Understanding the Unit of Work Details Page: 4.1. Viewing Unit-of-Work Details

4.2. The Directed Acyclic Graph

4.3. Step Table and Slice Histograms

4.4. Viewing the Hadoop Dashboard
5. Managing Applications with Tags: 5.1. Best Practice for Tags

5.2. Assigning Tags as Application-Level Properties

5.3. Assigning Tags in the Driven Plugin Configuration File

5.4. Using Tags in a Search Query

5.5. Finding Tagged Applications with the Column Chooser
6. Configuring Teams for Collaboration: 6.1. Creating and Managing Teams

6.2. Team Details

6.3. Associating an Application with a Team
7. Using Annotations: 7.1. Creating Custom Annotations

7.2. Data Visibility
8. Execute Hive Queries as Cascading HiveFlow: 8.1. Using HiveFlow

8.2. Driven for HiveFlow
9. Execute Cascading MapReduce Flows
10. User Profile: 10.1. User Actions

10.2. User Credentials

10.3. User Statistics

10.4. Invitations

10.5. Teams

Using the App Details Page

The app details page shows all the units of work and steps that are part of an application execution. (You can confirm that you are in the app details page if units of work are listed in the Details Table.) Driven aggregates the performance of individual mappers and reducers and frames them as metrics in the context of the overall application execution. This insight can facilitate easier application optimization and monitoring on the Hadoop cluster.

Searching App Details

By default, all units of work are shown. A search box appears near the top of the page for filtering what units of work are displayed. If you are inspecting a complex or long-running application and know the value of a unit-of-work parameter listed in Search field with Units of Work filter expanded, then the search can help pinpoint runtime factors that pertain to your goals. You can also filter the search results that are displayed by using the Status drop-down menu.

Tip

The unit-of-work search is especially helpful for parsing performance data from a high-volume Hive Server. The Driven agent that transforms the Hive Server telemetry data to the app details page treats the Hive Server as a single application, regardless of the number of Hive queries coming from the server.

Figure 1. Search field with Units of Work filter expanded

Viewing Application Details

The App section contains application detail panels with overall application information. Unlike the other sections, the view in these panels is not affected by the search bar.

The Status Panel

The Status panel displays a color-coded bar representing the time the application spent in each of the states it was in. The states illustrated by the bar are labeled with begin and end times (time entered and time exited), or dates for long-running applications. For detailed information on these states, see Understanding the Driven State Model.

Figure 2. Summary section showing the Status panel

Tip	If the timeline does not show a timestamp for a state, you can view state times in the application Details Table. Use the column chooser to add the state-times columns if they are not already included in the table.

Below the timeline is the Progress section, which displays the total number of units of work the application has, along with tables that categorize the application’s units of work by state. The Active table’s columns are named for the active states that a unit of work could be in, while the Completed table’s columns are named for the end states that a unit of work could be in. The values in these columns are simple counters that show how many units of work are in the specified state.

If an application is still active, a Slice Rate graph appears below the Progress section, along with the current number of active slices and the time of last update. Mousing over the graph updates the time and number of active slices to those of the chosen point. For more information, see Counters.

The Metrics Panel

The Metrics panel displays application-level aggregated counters. These metrics are collected by the Driven plugin or agent on each slice or step and rolled-up for an application-level view. For more information, see Counters.

Figure 3. App section showing the Counters panel

The Properties Panel

The Properties panel displays a selection of information on the application. Some properties are collected by the Driven plugin or agent, while others are part of the plugin or agent’s configuration.

Figure 4. App section showing the Properties panel

The following properties are shown:

Last updated - Last time data was received about the application.
Team - The team(s) associated with this application.
Owner - The system user who executed the application.
Frameworks - The big-data framework being used by the application.
Tags - Tags assigned to the application through configuration.
Version - The application’s version as set by its developer. If not explicitly set, the Driven plugin attempts to extract it from the JAR name.

The Environment Panel

The Environment panel displays different information depending on whether a Driven plugin or agent has been installed. Both versions of the panel display the following information about environment when available:

Host name and IP address
JVM maximum memory
Application process ID (PID)
Driven plugin version, JAR name, and path to the JAR
Application JAR path and name

The Environment panel for an application with the Driven agent also displays agent information: version, JAR name, and JAR path.

Note	Since the Driven agent is composed of the Driven plugin and wrapper code, the plugin information is still included in the panel for an application with the Driven agent.

Figure 5. Environment panel for an application with the Driven agent

The Environment panel for an application using Cascading includes the Cascading version in place of the Driven agent information.

Figure 6. Environment panel for an application with Cascading and the Driven plugin

Understanding the Directed Acyclic Graph

Each application instance is represented as a directed acyclic graph (DAG). The graph renders an interactive diagram of the units of work and steps that can reveal underlying slice performance issues. Units of work, step, and slice information are particularly useful to monitor instances of application execution over a period of time as the application grows in complexity and size. In addition, units of work and step details on the DAG and in the table below the graph can be used to:

Understand real-time dependencies between steps and units of work
Visualize your application, tracking steps in the graph to line numbers in your code
Investigate log error messages and stack exceptions
Tune application logic

When you execute your application, the underlying framework builds a state model to optimally execute the unit of work on the Hadoop cluster. The Driven Plugin transmits the state model to the Driven application, which represents the execution plan as a DAG.

Figure 7. Sample DAG of an application

The DAG representation of each application execution can be useful to stakeholders responsible for documenting how an application has been developed and has performed, such as a documentation analyst. Over a period of time, the analyst might not be able to track and record relevant application details. Because Driven has a persistence layer to store application execution data, past application performance can be recreated by generating a DAG on demand. Without such an interface, it can be difficult to map business needs to technical implementation, especially in work environments that involve large teams that are spread across different regions.

In the graph, each node corresponds to a step or a processing function in your application code. You can refer to the specific code for a step by clicking on the node link.

Viewing the Graph

The DAG on the app details page can be viewed in three different ways:

Contracted View - The Contracted View is useful for complex and large applications.

Logical View - The Logical View (default) shows all the steps and resources (excluding implicit resources) and built-in functions.

Physical View - The Physical View shows all the steps including the implicit resources and built-in functions. The Physical View may show more details than the Logical View, if any exist.

Driven uses a pipes metaphor as lines connecting steps in the DAG. A step is dependent on another step only if it relies on the execution of the previous step. The Driven Plugin dynamically determines the dependencies between the units of work. If the output (sink) of one unit of work is consumed by another unit of work (as a source), Driven notates that dependency by connecting the two units of work.

Visualizing your end-to-end application as a DAG along with operational data, such as read/write data processed at each step, can provide important insights into improving the performance of the application. For example, reviewing the DAG can expose opportunities to introduce Filter functions in your code upstream to reduce the volumes of the data being processed by the pipes or to make the Join functions more efficient.

Real-Time Visibility into Your Application

Driven can refresh the displayed information as updates stream in from the plugin. This includes display of real-time progress of your application, which includes highlighting the current steps being executed, number of completed steps, and read/write data.

Getting the most current information can be very useful. You might discover in Driven that a long-running job is not executing properly, which could be a signal to terminate the application. Also, for example, if you see sudden slow-down in the progress of your application, you may want to immediately start investigating the reason (network storm or a rogue job submitted to the cluster).

One of the most interesting insights is the ability to track the percentage of applications that have completed in real time. For long-running applications, it is often useful to spot-check the behavior to ensure that there are no anomalies.

Ensure that the Auto Update slider in the top right corner is enabled to allow the displayed Driven data to auto-refresh in real time.

Figure 8. Auto Update slider

Status State of the Application

Driven displays the status of each unit of work in an application in the rows of the Details Table. The status of an application instance is indicated by the icon at the top of the page under the application name.

The following is a list of the statuses:

Pending State - Pending status.

Started State - Started status.

Submitted State - Submitted status

Running State - Running status

Successful State - Successful status

Stopped State - Stopped status

Skipped State - Skipped status

Failed State - Failed status

Understanding the Driven State Model

The Driven process object model consists of an Application as the root parent.

Figure 9. Driven process object model

An Application can have any number of child Units of Work, which are analogous to Flows for those familiar with the Cascading model. Units of Work are composed of child Steps, which are composed of Nodes in turn. Slices are data concerning the actual instantiation and execution of logical nodes on the compute cluster. For example a logical mapper is a Node of the mapper type, which will then be instantiated multiple times and executed in parallel on the compute cluster in whatever slots can be allocated. Slice objects contain actual execution data such as counters and durations.

All Driven process objects, such as App, Unit of Work, and Step have a state, and progress through those states using a state machine explained below.

Figure 10. Driven process state machine

As soon as the Application client JVM starts, the Application is created in a "Pending" state. As the other logical application objects such as Units of Work, Steps and Nodes are created, they enter the "Pending" state.

Once processing is initiated, process objects move from "Pending" to "Started". When the client actually submits work to the compute cluster, the related process objects move into the "Submitted" state indicating that they are in the cluster queue. Applications can not be in a "Submitted" state as they represent the JVM that is submitting the work to the cluster.

Once an instance of a logical Node is actually executing on the compute cluster, the associated child and parents move into the "Running" state.

When all children have a completed state (Stopped, Successful, Failed, Skipped), The parent is also moved into a completed state.

The final Application state is based on the final state of the child Units of Work. Because there may be handled errors, Applications with failing Units of Work can be considered successful. The user can configure whether an Application with any failing Units of Work should be considered a failure using Driven plugin properties; see Driven Plugin

Because Slice data concerns actual execution data, Slices are really only ever in a running or completed state. For this reason no Slice data shows up in the Unit of Work detail view until they are running.

Stack Trace

For applications in FAILED status, you can view the stack trace of these applications to further investigate for errors. Click the Show stack trace error button in the upper right corner to display the stack trace error information. stack trace info

In addition, the failure status icon will show with a small info icon whenever there is a more detailed failure message for that object. You may see this icon next to a failed unit of work in the details table. This indicates that you can click on the icon to see the more detailed failure message and stack trace. failure info stack trace icon

Details Table

The table under the DAG provides a detailed breakdown of each unit of work in the application run. Some key monitoring assets of the tabular interface include the following capabilities:

Click on a hyperlinked unit of work name to focus on component slice performance, JobTrackers, and node statistics
Segment unit of work data with correlated JobTracker, step, and customized counter details

Uncovering Bottlenecks with the Timeline Column

Driven helps you visualize instrumentation counters in a context to help you tune your applications. The Timelines of the Details Table provide detailed dashboards of unit of work that comprise the application, helping you to quickly identify which part of your application needs attention (assuming you will first attempt to tune the more resource-draining parts of the application).

Tip	Hover over a segment of the Timeline bar graph to see what status is represented by the color.

Timelines in the right column help you scan application units of work to uncover possible bottlenecks

Timeline Diagnostics

Importing Counter Data and Other Metrics

Driven lets you customize most of the information that the table displays. Click the Select table columns icon Counter Chooser to reveal or conceal columnar metrics. The Status and Name columns cannot be hidden.

The columnar metrics are categorized in the column chooser. Each category can be collapsed or expanded.

See Counter Data and Other Metrics in Tables for more information.