Driven User Guide: Understanding the Unit of Work Details Page
version 2.2.6- 1. Overview of Monitored Applications
-
1.1. Logging In
1.2. Status Views
- 2. Searches, Saved Views, and Accessing Relevant Data
-
2.1. Starting a Search
2.3. My Teams Views
2.5. Customizing Searches
2.6. Periodic Views
- 3. Using the App Details Page
-
3.4. Viewing the Graph
3.6. Details Table
- 4. Understanding the Unit of Work Details Page
- 5. Managing Applications with Tags
- 6. Configuring Teams for Collaboration
-
6.1. Creating and Managing Teams
6.2. Team Details
- 7. Using Annotations
-
7.1. Creating Custom Annotations
7.2. Data Visibility
- 8. Execute Hive Queries as Cascading HiveFlow
-
8.1. Using HiveFlow
8.2. Driven for HiveFlow
- 9. Execute Cascading MapReduce Flows
- 10. User Profile
-
10.1. User Actions
10.2. User Credentials
10.3. User Statistics
10.4. Invitations
10.5. Teams
Understanding the Unit of Work Details Page
The Unit of Work details page can address many questions about an application run. A couple of typical issues that can be addressed include:
-
How does the application decompose to specific tasks?
-
Is there a particular cause for performance degradation: data skew, network storm, poor application logic, or inadequate cluster resource provisioning?
Viewing Unit-of-Work Details
The Unit of Work details section contains panels with overall Unit of Work information. In its title bar, this section displays the name of the Unit of Work, a copyable URL to the current view, and a status icon (see Status State of the Application for a list of icons).
The Status Panel
The Status panel displays a color-coded bar representing the time the Unit of Work spent in each of the states it was in. The timeline is a graphical way to view the amount of time the Unit of Work spent in each of the states it entered, including the current one. The states illustrated by the bar are labeled with begin and end times (time entered and time exited), or dates for long-running units of work. For detailed information on these states, see the State Model
Tip
|
If the timeline does not show a timestamp for a state, you can view state times in the Unit of Work Details Table. Use the column chooser to add the state-times columns if they are not already included in the table. |
Below the timeline is the Progress section, which displays the total number of steps the unit of work has, along with tables that categorize the unit of work’s steps by state. The Active table’s columns are named for the active states that a step could be in, while the Completed table’s columns are named for the end states that a step could be in. The values in these columns are simple counters that show how many steps are in the specified state.
If a unit of work is still active, a Slice Rate graph appears below the Progress section, along with the current number of active slices and the time of last update. Mousing over the graph updates the time and number of active slices to those of the chosen point. For more information, see Common Counters.
The Metrics Panel
The Metrics Panel displays unit-of-work-level metrics. These same metrics (from each unit of work) are aggregated to provide the values for the equivalent application-level metrics. For more information on Driven metrics, see Counters.
The Directed Acyclic Graph
All Units of Work can be represented by a Directed Acyclic Graph.
Different platforms are able to provide different vertex and edge data and therefore the DAG representations will vary.
All platforms will show data source and sink resource vertices with sanitized information about the resource URI and fields. They will also show at least one processing vertex, with appropriate details for that operation.
Hive Queries and the DAG
A Hive query DAG will be drawn showing a query icon between the source table(s) and output table(s). The user can click on the source and sink resource nodes to see more detailed information about the tables touched by the query. Hive units of work only have one processing vertex representing the HQL query. The details of the query are listed in the Properties tab of the status panel.
Native Map Reduce and the DAG
Native map reduce DAGs will show map, shuffle, and reduce vertices between the source and sink resource vertices.
Spark and the DAG
Spark DAGs will show resilient distributed dataset (RDD) vertices between the source and sink resource vertices.
The Cascading Query Planner and the DAG
A key component of a Cascading application is the query planner. When the Cascading application executes, the query planner compiles all the data-processing steps, analyzes dependencies of the steps, and develops the DAG for the application.
The Cascading query planner iterates through the DAG, breaking it into smaller and smaller graphs - called expression graphs - until the graph matches a pattern associated with a unit of work, such as a mapper or a reducer.
Step Table and Slice Histograms
You can add further granular metrics to the slice level of your application by adding counters. Click the Add counters button to display the available counters. Select the desired counter by clicking the checkbox.
Understanding Bottlenecks in Your Application
In the slice performance dashboard, you can see the slice (a unit of work such as a map or a reduce task) information at the individual or at an aggregate level.
Observe if any of your slices are skewed. In a MapReduce application, the data is divided and processed in equal-sized chunks. If certain slices are taking more time to finish processing a similar type of task with (assumed) similarly sized data, then it is an anomaly and could indicate application execution problems.
Often, these skews indicate that applications are processing a large number of small files, which usually means that you need to optimize the environment. In other cases, depending on the skew dimension, they could indicate a network issue, which can delay the shuffle-sort operations in MapReduce.
For further details on bottlenecks, click View Slice Waterfall to open a waterfall, or gantt-like timeline, of the slices in the selected node.
From this view, the number of active slices running on the cluster concurrently is shown upon hovering the mouse over the diagram. A low number of active slices can indicate congestion on your cluster. Also look out for:
-
Holes in the timeline, which may indicate preemption from another job
-
Very many short slices, also called the small file problem
-
Very few long slices, which could be split up if they take too long