Driven User Guide: Topics for Cascading Applications
version 2.2.6- 1. Overview of Monitored Applications
-
1.1. Logging In
1.2. Status Views
- 2. Searches, Saved Views, and Accessing Relevant Data
-
2.1. Starting a Search
2.3. My Teams Views
2.5. Customizing Searches
2.6. Periodic Views
- 3. Using the App Details Page
-
3.4. Viewing the Graph
3.6. Details Table
- 4. Understanding the Unit of Work Details Page
- 5. Managing Applications with Tags
- 6. Configuring Teams for Collaboration
-
6.1. Creating and Managing Teams
6.2. Team Details
- 7. Using Annotations
-
7.1. Creating Custom Annotations
7.2. Data Visibility
- 8. Execute Hive Queries as Cascading HiveFlow
-
8.1. Using HiveFlow
8.2. Driven for HiveFlow
- 9. Execute Cascading MapReduce Flows
- 10. User Profile
-
10.1. User Actions
10.2. User Credentials
10.3. User Statistics
10.4. Invitations
10.5. Teams
Using Annotations
Annotations display metadata about flow and step nodes on the application details and slice performance views of Driven. Click on a node in the directed acyclic graph (DAG) to view annotations about the operation, such as processing details for a tap, filter, or function of a flow.
Creating Custom Annotations
The types of metadata that are exposed in annotations are selected as part of Cascading application development.
Use Cascading 2.6 or above to assign annotations to Cascading functions and taps.
Also, refer to
the
Cascading 3.0 core API reference for details about the
cascading.management.annotation
package.
Data Visibility
Driven renders all the application metadata associated with the annotations. However, for privacy and compliance reasons, you may want to restrict access to information about a certain Property to a subset of Driven users. Access control becomes an important feature if you want to restrict visibility of some metadata attributes to comply with privacy or governance guidelines in a shared, multitenant cluster.
In the following code example, the visibility rule is applied for users based on their identity in Driven. In the following example, the visibility rule is set to PUBLIC:
@Property(name = "scrubTextConvert", visibility = Visibility.PUBLIC)
@PropertyDescription("_my_property_description_")
...
@Property(name = "scrubText", visibility = Visibility.PUBLIC)
@PropertyDescription("_my_property_description_")
...
Driven maps the visibility levels listed in the table below to the state of the user session.
Property (user session) | Public Access (Anonymous) | Protected Access (Login) | Private Access (Team) |
---|---|---|---|
PUBLIC |
X |
X |
X |
PROTECTED |
X |
X |
|
PRIVATE |
X |
This mapping can be configured in the driven.properties file in order to effect your governance guidelines. The example below illustrates a typical use of the visibility levels.
PUBLIC |
Allow metadata attributes to be observed anonymously, by default. |
PROTECTED |
Allow metadata attributes to be viewed by users who log in to Driven. |
PRIVATE |
Allow metadata attributes to be viewed by members of a Driven team. This level is also used when access is restricted by role, such as Driven admin or team leader. |
Execute Hive Queries as Cascading HiveFlow
Cascading execution framework can run Hive queries. This will enable your Hive applications to benefit from Cascading platform: dynamic management of all Hive objects, visibility into the end-to-end flow of the application, instrumentation, orchestration of your Hive modules for error recovery, and integration with major third-party systems such as Elasticsearch and Teradata.
Using HiveFlow
You can move your Hive Query Language (HQL) queries into production using an API from HiveFlow and the runtime monitoring capabilities of Driven.
HiveFlow is a simple Java wrapper that simplifies the chaining of multiple HQL statements into a single maintainable application. It transparently sends telemetry to Driven so that an HQL-based application can be managed and monitored in real-time.
With HiveFlow, even applications based on multiple technologies, such as Hive, custom MapReduce, Cascading, and Scalding, can be chained together within the same application (an Apache Hadoop job JAR). The consolidation simplifies testing, deployment, maintenance, and monitoring.
Execute Cascading MapReduce Flows
You can take your existing MapReduce jobs and apply the Cascading Class MapReduceFlow, which is a HadoopFlow subclass.
The Class MapReduceFlow allows custom MapReduce jobs to be executed by Cascading.
After Driven receives data from application execution, you can see the directed acyclic graph (DAG) representation of the MapReduce job with flows and their dependencies.
Note
|
Driven does not show any performance data in this example since the job was just launched. |