Driven User Guide: Understanding the Unit of Work Details Pageversion 2.2.6
- 1. Overview of Monitored Applications
1.1. Logging In
1.2. Status Views
- 2. Searches, Saved Views, and Accessing Relevant Data
2.1. Starting a Search
2.3. My Teams Views
2.5. Customizing Searches
2.6. Periodic Views
- 3. Using the App Details Page
3.4. Viewing the Graph
3.6. Details Table
- 4. Understanding the Unit of Work Details Page
- 5. Managing Applications with Tags
- 6. Configuring Teams for Collaboration
6.2. Team Details
- 7. Using Annotations
7.2. Data Visibility
- 8. Execute Hive Queries as Cascading HiveFlow
8.1. Using HiveFlow
8.2. Driven for HiveFlow
- 9. Execute Cascading MapReduce Flows
- 10. User Profile
10.1. User Actions
10.2. User Credentials
10.3. User Statistics
The Unit of Work details page can address many questions about an application run. A couple of typical issues that can be addressed include:
How does the application decompose to specific tasks?
Is there a particular cause for performance degradation: data skew, network storm, poor application logic, or inadequate cluster resource provisioning?
The Unit of Work details section contains panels with overall Unit of Work information. In its title bar, this section displays the name of the Unit of Work, a copyable URL to the current view, and a status icon (see Status State of the Application for a list of icons).
The Status panel displays a color-coded bar representing the time the Unit of Work spent in each of the states it was in. The timeline is a graphical way to view the amount of time the Unit of Work spent in each of the states it entered, including the current one. The states illustrated by the bar are labeled with begin and end times (time entered and time exited), or dates for long-running units of work. For detailed information on these states, see the State Model
|If the timeline does not show a timestamp for a state, you can view state times in the Unit of Work Details Table. Use the column chooser to add the state-times columns if they are not already included in the table.
Below the timeline is the Progress section, which displays the total number of steps the unit of work has, along with tables that categorize the unit of work’s steps by state. The Active table’s columns are named for the active states that a step could be in, while the Completed table’s columns are named for the end states that a step could be in. The values in these columns are simple counters that show how many steps are in the specified state.
If a unit of work is still active, a Slice Rate graph appears below the Progress section, along with the current number of active slices and the time of last update. Mousing over the graph updates the time and number of active slices to those of the chosen point. For more information, see Common Counters.
The Properties Panel reports on the last time data was received about the unit of work.
If the unit of work is part of a Hive-based application, it may display a Statement property containing the SQL executed by the unit of work.
All Units of Work can be represented by a Directed Acyclic Graph.
Different platforms are able to provide different vertex and edge data and therefore the DAG representations will vary.
All platforms will show data source and sink resource vertices with sanitized information about the resource URI and fields. They will also show at least one processing vertex, with appropriate details for that operation.
A Hive query DAG will be drawn showing a query icon between the source table(s) and output table(s). The user can click on the source and sink resource nodes to see more detailed information about the tables touched by the query. Hive units of work only have one processing vertex representing the HQL query. The details of the query are listed in the Properties tab of the status panel.
Native map reduce DAGs will show map, shuffle, and reduce vertices between the source and sink resource vertices.
Spark DAGs will show resilient distributed dataset (RDD) vertices between the source and sink resource vertices.
A key component of a Cascading application is the query planner. When the Cascading application executes, the query planner compiles all the data-processing steps, analyzes dependencies of the steps, and develops the DAG for the application.
The Cascading query planner iterates through the DAG, breaking it into smaller and smaller graphs - called expression graphs - until the graph matches a pattern associated with a unit of work, such as a mapper or a reducer.