Driven User Guide: Topics for Cascading Applications

version 2.2.6

1. Overview of Monitored Applications

1.1. Logging In

2. Searches, Saved Views, and Accessing Relevant Data

2.1. Starting a Search

3. Using the App Details Page

3.1. Searching App Details

4. Understanding the Unit of Work Details Page

4.1. Viewing Unit-of-Work Details

5. Managing Applications with Tags

5.1. Best Practice for Tags

6. Configuring Teams for Collaboration

6.1. Creating and Managing Teams

7. Using Annotations

7.1. Creating Custom Annotations

8. Execute Hive Queries as Cascading HiveFlow

8.1. Using HiveFlow

9. Execute Cascading MapReduce Flows
10. User Profile

10.1. User Actions

10.4. Invitations

10.5. Teams

Using Annotations

Annotations display metadata about flow and step nodes on the application details and slice performance views of Driven. Click on a node in the directed acyclic graph (DAG) to view annotations about the operation, such as processing details for a tap, filter, or function of a flow.

Annotation function
Figure 1. Sample annotation displaying the Function type within the wc flow

Creating Custom Annotations

The types of metadata that are exposed in annotations are selected as part of Cascading application development.

Use Cascading 2.6 or above to assign annotations to Cascading functions and taps.

Also, refer to the Cascading 3.0 core API reference for details about the cascading.management.annotation package.

Data Visibility

Driven renders all the application metadata associated with the annotations. However, for privacy and compliance reasons, you may want to restrict access to information about a certain Property to a subset of Driven users. Access control becomes an important feature if you want to restrict visibility of some metadata attributes to comply with privacy or governance guidelines in a shared, multitenant cluster.

In the following code example, the visibility rule is applied for users based on their identity in Driven. In the following example, the visibility rule is set to PUBLIC:

@Property(name = "scrubTextConvert", visibility = Visibility.PUBLIC)
@PropertyDescription("_my_property_description_")
...
@Property(name = "scrubText", visibility = Visibility.PUBLIC)
@PropertyDescription("_my_property_description_")
...

Driven maps the visibility levels listed in the table below to the state of the user session.

Table 1. Visibility Levels
Property (user session) Public Access (Anonymous) Protected Access (Login) Private Access (Team)

PUBLIC

X

X

X

PROTECTED

X

X

PRIVATE

X

This mapping can be configured in the driven.properties file in order to effect your governance guidelines. The example below illustrates a typical use of the visibility levels.

The default visibility mapping effectively means the properties control viewing of information in the following manner:
PUBLIC

Allow metadata attributes to be observed anonymously, by default.

PROTECTED

Allow metadata attributes to be viewed by users who log in to Driven.

PRIVATE

Allow metadata attributes to be viewed by members of a Driven team. This level is also used when access is restricted by role, such as Driven admin or team leader.

Execute Hive Queries as Cascading HiveFlow

Cascading execution framework can run Hive queries. This will enable your Hive applications to benefit from Cascading platform: dynamic management of all Hive objects, visibility into the end-to-end flow of the application, instrumentation, orchestration of your Hive modules for error recovery, and integration with major third-party systems such as Elasticsearch and Teradata.

Using HiveFlow

You can move your Hive Query Language (HQL) queries into production using an API from HiveFlow and the runtime monitoring capabilities of Driven.

HiveFlow is a simple Java wrapper that simplifies the chaining of multiple HQL statements into a single maintainable application. It transparently sends telemetry to Driven so that an HQL-based application can be managed and monitored in real-time.

With HiveFlow, even applications based on multiple technologies, such as Hive, custom MapReduce, Cascading, and Scalding, can be chained together within the same application (an Apache Hadoop job JAR). The consolidation simplifies testing, deployment, maintenance, and monitoring.

Driven for HiveFlow

Hive App2
Figure 2. Sample usage of flow details page

Drilling-down to view a HQL statement. Click the small download icon icon to copy the statement to your clipboard.

Execute Cascading MapReduce Flows

You can take your existing MapReduce jobs and apply the Cascading Class MapReduceFlow, which is a HadoopFlow subclass.

The Class MapReduceFlow allows custom MapReduce jobs to be executed by Cascading.

After Driven receives data from application execution, you can see the directed acyclic graph (DAG) representation of the MapReduce job with flows and their dependencies.

MapReduce Wrapper
Figure 3. Driven displays your MapReduce job DAG representation, which allows you to drill down to a specific flow
Note
Driven does not show any performance data in this example since the job was just launched.