Driven User Guide: Understanding the Unit of Work Details Page

version 2.2.6

1. Overview of Monitored Applications

1.1. Logging In

2. Searches, Saved Views, and Accessing Relevant Data

2.1. Starting a Search

3. Using the App Details Page

3.1. Searching App Details

4. Understanding the Unit of Work Details Page

4.1. Viewing Unit-of-Work Details

5. Managing Applications with Tags

5.1. Best Practice for Tags

6. Configuring Teams for Collaboration

6.1. Creating and Managing Teams

7. Using Annotations

7.1. Creating Custom Annotations

8. Execute Hive Queries as Cascading HiveFlow

8.1. Using HiveFlow

9. Execute Cascading MapReduce Flows
10. User Profile

10.1. User Actions

10.4. Invitations

10.5. Teams

Understanding the Unit of Work Details Page

The Unit of Work details page can address many questions about an application run. A couple of typical issues that can be addressed include:

  • How does the application decompose to specific tasks?

  • Is there a particular cause for performance degradation: data skew, network storm, poor application logic, or inadequate cluster resource provisioning?

Viewing Unit-of-Work Details

The Unit of Work details section contains panels with overall Unit of Work information. In its title bar, this section displays the name of the Unit of Work, a copyable URL to the current view, and a status icon (see Status State of the Application for a list of icons).

uow details status panel
Figure 1. Unit-of-work section showing the Status panel

The Status Panel

The Status panel displays a color-coded bar representing the time the Unit of Work spent in each of the states it was in. The timeline is a graphical way to view the amount of time the Unit of Work spent in each of the states it entered, including the current one. The states illustrated by the bar are labeled with begin and end times (time entered and time exited), or dates for long-running units of work. For detailed information on these states, see the State Model

Tip
If the timeline does not show a timestamp for a state, you can view state times in the Unit of Work Details Table. Use the column chooser to add the state-times columns if they are not already included in the table.

Below the timeline is the Progress section, which displays the total number of steps the unit of work has, along with tables that categorize the unit of work’s steps by state. The Active table’s columns are named for the active states that a step could be in, while the Completed table’s columns are named for the end states that a step could be in. The values in these columns are simple counters that show how many steps are in the specified state.

If a unit of work is still active, a Slice Rate graph appears below the Progress section, along with the current number of active slices and the time of last update. Mousing over the graph updates the time and number of active slices to those of the chosen point. For more information, see Common Counters.

The Metrics Panel

The Metrics Panel displays unit-of-work-level metrics. These same metrics (from each unit of work) are aggregated to provide the values for the equivalent application-level metrics. For more information on Driven metrics, see Counters.

The Properties Panel

The Properties Panel reports on the last time data was received about the unit of work.

If the unit of work is part of a Hive-based application, it may display a Statement property containing the SQL executed by the unit of work.

The Environment Panel

The Environment Panel shows the platform on which the unit of work is being executed.

The Directed Acyclic Graph

All Units of Work can be represented by a Directed Acyclic Graph.

Different platforms are able to provide different vertex and edge data and therefore the DAG representations will vary.

All platforms will show data source and sink resource vertices with sanitized information about the resource URI and fields. They will also show at least one processing vertex, with appropriate details for that operation.

Hive Queries and the DAG

A Hive query DAG will be drawn showing a query icon between the source table(s) and output table(s). The user can click on the source and sink resource nodes to see more detailed information about the tables touched by the query. Hive units of work only have one processing vertex representing the HQL query. The details of the query are listed in the Properties tab of the status panel.

hive query dag
Figure 2. DAG example for a Hive query

Native Map Reduce and the DAG

Native map reduce DAGs will show map, shuffle, and reduce vertices between the source and sink resource vertices.

native mr dag
Figure 3. DAG example for MR Unit of Work

Spark and the DAG

Spark DAGs will show resilient distributed dataset (RDD) vertices between the source and sink resource vertices.

spark dag
Figure 4. DAG example for Spark Unit of Work

The Cascading Query Planner and the DAG

A key component of a Cascading application is the query planner. When the Cascading application executes, the query planner compiles all the data-processing steps, analyzes dependencies of the steps, and develops the DAG for the application.

Operation Dag
Figure 5. DAG rendering as compiled by the query planner

The Cascading query planner iterates through the DAG, breaking it into smaller and smaller graphs - called expression graphs - until the graph matches a pattern associated with a unit of work, such as a mapper or a reducer.

Mapper Reducer2
Figure 6. Steps associated with their mappers and reducers, as well as their expression graphs

Step Table and Slice Histograms

You can add further granular metrics to the slice level of your application by adding counters. Click the Add counters button to display the available counters. Select the desired counter by clicking the checkbox.

Add Counters
Figure 7. Adding counters to the slice performance dashboard

Understanding Bottlenecks in Your Application

In the slice performance dashboard, you can see the slice (a unit of work such as a map or a reduce task) information at the individual or at an aggregate level.

Skew Data
Figure 8. This example shows skewed data at the slice level

Observe if any of your slices are skewed. In a MapReduce application, the data is divided and processed in equal-sized chunks. If certain slices are taking more time to finish processing a similar type of task with (assumed) similarly sized data, then it is an anomaly and could indicate application execution problems.

Often, these skews indicate that applications are processing a large number of small files, which usually means that you need to optimize the environment. In other cases, depending on the skew dimension, they could indicate a network issue, which can delay the shuffle-sort operations in MapReduce.

For further details on bottlenecks, click View Slice Waterfall to open a waterfall, or gantt-like timeline, of the slices in the selected node.

Slice Waterfall
Figure 9. A slice waterfall node with good cluster utilization

From this view, the number of active slices running on the cluster concurrently is shown upon hovering the mouse over the diagram. A low number of active slices can indicate congestion on your cluster. Also look out for:

  • Holes in the timeline, which may indicate preemption from another job

  • Very many short slices, also called the small file problem

  • Very few long slices, which could be split up if they take too long

Viewing the Hadoop Dashboard

If there is a Hadoop dashboard for a step, the row for the step has a Job Tracker hyperlink.

Hadoop JobTracker Link
Figure 10. Link to Hadoop dashboard