Driven Administrator Guide: The Driven Command Line Client

version 2.2.6

The Driven Command Line Client

Installing the Driven CLI Client

The Driven CLI Client is an application to perform such tasks such as backing up, monitoring, and integrating (if required) Driven with a centralized monitoring tool.

Note
The Driven Server and CLI Client are typically installed on the same machine. If you choose not to install on the same machine, then ensure that the CLI Client host machine has access to Driven Server ports 8080, 9200, and 9300.

Step 1: Download and unzip the CLI Client

$ cd $DOWNLOAD_PATH
$ wget -i http://files.concurrentinc.com/driven/2.2/driven-client/latest.txt
$ cd $INSTALL_PATH
$ tar -xvf $DOWNLOAD_PATH/$ZIP_FILE

Step 2: Place the bin directory inside the unzipped CLI Client on your PATH

$ export PATH=$PATH:$INSTALL_PATH/driven/bin
$ driven --help
Usage: driven COMMAND [options]
where COMMAND is one of:
  backup  backup/restore operations
  scope   query status and runtimes
  reindex update elasticsearch data
driven COMMAND --help for options

Backing Up and Restoring Driven Repositories

To be prepared for disaster recovery, you must implement a policy for taking regular snapshots of the Driven repository. Use either Amazon S3 or a filesystem repository for the backup-and-restore system.

Prerequisites for Backup to S3 Repos If you want to back up your data to an

Amazon S3 repository, the procedure for configuring Amazon Web Services (AWS) credentials depends on whether there is an embedded or an external Elasticsearch datastore for the Driven deployment.

Embedded Elasticsearch Datastore: Either configure the following parameters of the driven.properties file:

driven.aws.accesskey=
driven.aws.secretkey=

Or set the following environment variables:

AWS_ACCESS_KEY_ID
AWS_SECRET_KEY

External Elasticsearch Cluster: The AWS credentials are set in the Elasticsearch layer, not part of the Driven configuration. Install the Amazon Web Services Cloud plugin on the Elasticsearch cluster. The AWS credentials can be configured in several different ways, including setting properties in the YML file, environment variables, or IAM roles. See the AWS Cloud Plugin page on GitHub for more information.

The backup Command

Use the Driven CLI Client to administer backup and recovery functions. The following CLI example shows information about options that can be used with the backup command.

$ driven backup --help

java driven.management.backup.Backup [options...]

Optional:
 env vars: DRIVEN_CLUSTER, DRIVEN_HOSTS

Option                       Description
------                       -----------
--bucket                     S3 bucket for repository
--cluster                    Driven cluster name (default: driven)
--debug [Boolean]            enable debugging (default: false)
--delete                     name of snapshot to delete from repository
--delete-older-than          delete snapshots older than supplied date ( yyyy-MM-dd )
--display-width <Integer>    width of display (default: 80)
--help
--hosts                      Driven server host(s) (default: localhost)
--json [Options$JsonOpts]    output data as json (default: values)
--list                       list repositories, or if --repository is used list
                               snapshots in repository
--no-header
--path                       specify filesystem path for repository
--prefix                     S3 prefix for repository bucket (default: snapshots)
--print                      print query parameters
--region                     AWS region for repository bucket (default: us-east-1)
--register-fs                register a local repository with name
--register-s3                register a remote S3 repository with name
--repository                 name of repository
--restore                    name of snapshot to restore from repository
--snapshot                   name of snapshot to create in repository
--unregister                 name of repository to unregister
--verbose                    logging level (default: info)

Backing Up Your Driven Data

The backup process generally entails creating a repository location and storing data snapshots there.

There are two ways to create a repository. You can either specify the repository in the driven.properties file, or you can use the Driven CLI Client to create the repository.

Step 1: - Option 1 Configure the driven.properties file to create a repository

To create only one default repository, add the following to the driven.properties file. Use either Amazon S3 or a filesystem repository:

For Amazon S3:

driven.backup.repository.type=s3
driven.backup.repository.bucket=myExampleBucketName
driven.backup.repository.region=us-east-example

For shared filesystem repository:

driven.backup.repository.type=fs
driven.backup.repository.path=/opt/driven-backup/snapshots

Step 1: - Option 2 Create a repository with the CLI Client

For Amazon S3 (servers in a cluster must have appropriate AWS credentials for the bucket):

$ driven backup --register-s3 s3-repo --bucket driven-backup

registered S3 repository 's3-repo' at '[us-east-example] driven-backup/snapshots'

For shared filesystem repository (all the servers in a cluster must have read/write access to the registered path):

$ driven backup --register-fs fs-repo --path /opt/driven-backup/snapshots

registered FS repository 'fs-repo' at:
     '/opt/driven-backup/snapshots'

To list the repositories that you created:

$ driven backup --list

Repository            Type Location
----------            -------------
s3-repo       	      s3   s3://driven-backup/snapshots
fs-repo               fs   /opt/driven-backup/snapshots

Step 2: Create a snapshot

$ driven backup --repository s3-repo --snapshot snapshot_3
CREATING.........................
Snapshot                  State  	  Status    Started                    	     Finished
--------                  -----  	  ------    -------                    	     --------
snapshot_3                SUCCESS	  OK        Thu Jun 19 10:35:08 PDT 2014     Thu Jun 19 10:35:33 PDT 2014

If no argument is given, a snapshot name with a universally unique identifier (UUID) is automatically assigned.

To list the snapshots that you created:

$ driven backup --repository s3-repo --list
Snapshot                 State  	Status  Started                    	   Finished
--------                 -----  	------  -------            	           --------
snapshot_1               SUCCESS	OK      Tue Jun 17 10:32:56 PDT 2014   Tue Jun 17 10:33:08 PDT 2014
snapshot_2               SUCCESS	OK      Tue Jun 17 11:33:02 PDT 2014   Tue Jun 17 11:33:15 PDT 2014

Restoring Data from a Snapshot

$ driven backup --repository s3-repo --restore snapshot_2
RESTORING........
snapshot snapshot_2 restore status 200

In the example above, "snapshot_2" is restored.

Extracting Data with the scope Command

In addition to using the Driven CLI Client for doing backups, you can also use the client to extract data from Driven for integrating with third-party monitoring applications. The scope command is useful for such integration.

$ driven scope --help

java driven.management.scope.Scope [options...]

Optional:
 env vars: DRIVEN_CLUSTER, DRIVEN_HOSTS

Option                                                       Description
------                                                       -----------
--between <natural language date/time>
--by-parent
--cause, --with-cause [cause or filter with * or ?]          all unique failure causes, or only those match filter
--child-id, --with-child-id [id or partial id]
--cluster                                                    driven cluster name (default: driven)
--counter <group and counter name, eg. 'foo:bar.counter'>
--debug [Boolean]                                            enable debugging (default: false)
--display-width <Integer>                                    width of display (default: 80)
--duration [[pending, started, submitted, running,           interval to calculate duration from (default: [started,
  finished]]                                                   finished])
--duration-interval                                          time period to filter values, eg. 5min:25min
--duration-period                                            time period to bucket values (default: PT15M)
--entity                                                     entity IDs to constrain results
--fields                                                     output field names, '*' denotes defaults (default: [type,
                                                               id, name, status, duration])
--from <Integer: offset from which to begin returning        (default: 0)
  results>
--help
--hosts                                                      driven server host(s) (default: localhost)
--id, --with-id [id or partial id]
--jmx
--json [Options$JsonOpts]                                    output data as json (default: values)
--limit <Integer: limit the number of results>               (default: 1000000)
--name, --with-name [name or filter with * or ?]             all unique names of type, or only those match filter
--no-header
--owner, --with-owner [owner or filter with * or ?]          all unique owners of type, or only those match filter
--parent-id, --with-parent-id [id or partial id]
--parent-name, --with-parent-name <name of parent>
--parent-status, --with-parent-status <Invertible:
  [pending, skipped, started, submitted, running,
  successful, stopped, failed, engaged, finished, all]>
--parent-type, --with-parent-type <ProcessType: [cluster,
  app, cascade, flow, step, slice, undefined]>
--print                                                      print query parameters
--since <natural language date/time, default 2 days from
  'till'>
--sort                                                       sort field names - default is none
--status, --with-status [Invertible: [pending, skipped,
  started, submitted, running, successful, stopped,
  failed, engaged, finished, all]]
--status-time [[pending, started, submitted, running,        date/time field to filter against. one of: [pending,
  finished]]                                                   started, submitted, running, finished] (default:
                                                               started)
--tag, --with-tag [tag name]                                 unique tags of type, or only those that match
--text-search                                                full search of pre-defined text fields - currently: ID,
                                                               name, owner
--till <natural language date/time, default is now>
--type [ProcessType: [cluster, app, cascade, flow, step,     the process type (default: app)
  slice, undefined]]
--verbose                                                    logging level (default: info)
--version

With the scope command, you can query to retrieve information about current and historical processes, where a process can be an application, cascade, flow, step, or slice (a generalization of a Hadoop task).

The command is useful for two types of roles: discovery and monitoring. Discovery is finding specific process instances based on any metadata. Monitoring is observing the changes in metadata of specific process instances (for example, a flow has changed from RUNNING to FAILED status). It also allows you to report on a target process type while refining the results based on parent and target metadata. Additionally, this tool allows you to report on a target process type while refining the results based on parent and target metadata.

Examples of Command Usage

List all Skipped flows in a Running application:

$ driven scope --type flow --status skipped

List all Skipped flows in a “running” application:

$ driven scope --type flow --status skipped --parent-type app --parent-status running

List the current statuses of all flows in all Running applications:

$ driven scope --type flow --status --parent-type app --parent-status running

Or more specifically, for each RUNNING application, list the statuses of their child flows, grouped by application:

$ driven scope --type flow --status --parent-type app --parent-status running --by-parent

Common Command-Line Options

Many CLI options begin start as with; for example, --with-name. These can be abbreviated further by removing the with so that you can just use --name in place of --with-name.

Filters

Use the following command line for filters:

--type = app, cascade, flow, step, slice

--with-tag = user-defined data for filtering

--with-status = one or more of the following values: PENDING, STARTED, SUBMITTED, RUNNING, SUCCESSFUL, FAILED, STOPEED, SKIPPED.. If blank, all status values will be displayed as a chart.

The ^ (caret) before the option parameter means “not”. For example, ^running sets the filter condition to not in RUNNING state.

--with-id = filter for an identifier

--with-name = name or name filter

--with-parent-name = in tandem with --parent-type

--with-parent-status = in tandem with --parent-type

--with-parent-id = for listing children of type having the given parent ID, --parent-type is ignored

--statusTime = which status time to filter against; pending, start, submit, run, finished

--till = filter results to date/time

--since = filter results from date/time

--between = filter results between dates/times

Status

Most processes can be in one of nine states. They are:

--pending - when the process is created

--started - when the process has been notified it may start work

--submitted - when the process, or child process, has been submitted to a cluster

--running - when a process is actually executing the data pipeline

--successful - when a process has completed

--failed - when a process has failed

--stopped - when a process, or child process, received a stop notification

--skipped - when a flow was not executed, usually because the sinks were not stale

--status - shows summary of all status values

Duration

To show a timeline of all durations, grouped by period, use the following commands:

--duration = start:finished

--duration-period = the time in which to bucket the results. For example, 10sec, 15min, 2hrs, 1wk

--duration-interval = the range of time to display. For example, 15min:30min

How-To Tips

How do I monitor job in progress?

If you have already identified a step or a flow that you wish to monitor, enter:

$ driven scope --type slice --parent-type step --parent-id _000_ --status

This command summarizes all the slice statuses for the requested step.

How do I list all users currently running applications?

To list all known users or process owners, enter:

$ driven scope --owner

To filter the list to include owners with running apps:

$ driven scope --owner --status running

Where in the code did the job fail?

If you have the app instance parent ID, you can list all the causes for the failure by entering:

$ driven scope --parent-id _000_ --type slice --cause

This command returns a list of all the exceptions and messages thrown.

For additional detailed information, enter the command:

$ driven scope --type slice --status failed \
  --fields id,failedBranchName,failedPipeLocation,failedCause,failedMethodLocation