Installing Driven Plugin on an Amazon EMR Master Node
version 2.1.4Installing the Plugin on EMR
If you run your applications in an Amazon Web Services Elastic MapReduce (AWS EMR) cluster, use a bootstrap action to install the plugin. The bootstrapping works with both persistent EMR clusters as well as auto-terminating clusters.
You can bootstrap by using either the AWS command-line interface (CLI) or the AWS Management Console. Bootstrapping installs the plugin on the EMR master node so that Cascading applications launched from the AWS CLI or the Management Console automatically operate with the plugin.
Amazon Web Services Command-Line Interface
The following code example shows you how to bootstrap Driven to an EMR cluster if you want to use the AWS CLI. Note the following about the code example:
-
The argument
"--api-key,${DRIVEN_API_KEY}"
appears in the following code. Using a Driven API key is optional. Omit this argument if you do not want to enable team features in Driven. -
${DRIVEN_API_KEY}
is a variable, which must be replaced with your real key if you want to pass the API key.
--bootstrap-actions Path=s3://files.concurrentinc.com/driven/2.1/hosted/driven-plugin/install-driven-plugin.sh,Args="--api-key,${DRIVEN_API_KEY}"
Amazon Web Services Management Console
To bootstrap the Driven Plugin using the AWS Management Console:
-
Navigate to the Create Cluster window.
-
Select the option to add a custom bootstrap action.
-
Enter the configuration for the new bootstrap action:
-
Name:
Driven Bootstrap Action
-
S3 Location:
s3://files.concurrentinc.com/driven/2.1/hosted/driven-plugin/install-driven-plugin.sh
-
Argument (Optional):
--api-key ${DRIVEN_API_KEY}
} (This argument is required only if you want to enable the teams feature in Driven. Substitute${DRIVEN_API_KEY}
with your value.)
-
EMR 4.x
Amazon introduced a set of changes in EMR version 4.0, that have a direct influence on how to install the Driven plugin. One important change is that bootstrap actions can no longer modify the installation of Hadoop, since Hadoop is only deployed after all bootstrap actions have been executed. The new way of changing the hadoop installation with user defined settings is using the application configuration feature.
Driven provides a set of configurations that allow the plugin to work seamlessly on EMR 4.x clusters.
If you are using the command line to launch your EMR cluster, you have to use
the --configurations
switch
--configurations http://files.concurrentinc.com/driven/2.1/hosted/driven-plugin/configurations-cascading.json"
The AWS Management Console requires the s3 protocol for configuration files. This currently causes cross account and cross region access issues even with publicly hosted files. You have two options for using the provided configuration file at http://files.concurrentinc.com/driven/2.1/hosted/driven-plugin/configurations-cascading.json with the AWS Management Console.
-
Download the configuration file and upload it into your own s3 bucket. Then use the s3 url for the file in your bucket for the configuration from s3 option.
-
Download the configuration file, copy the provided JSON from the file, and then paste it into the JSON configuration text window.
Note
|
The --configurations switch for the emr commandline client only supports
the file:// and http(s) protocols. Using s3:// protocol like in the AWS
Management Console will cause an error.
|
Start Monitoring Your Data Applications
You are now ready to see how your applications are doing. See the Using Driven section of the Getting Started page.