[ aws . emr ]

add-steps

Description

Add a list of steps to a cluster.

See ‘aws help’ for descriptions of global parameters.

Synopsis

  add-steps
--cluster-id <value>
--steps <value> [<value>...]

Options

--cluster-id (string)

A unique string that identifies a cluster. The create-cluster command returns this identifier. You can use the list-clusters command to get cluster IDs.

--steps (list)

Specifies a list of steps to be executed by the cluster. Steps run only on the master node after applications are installed and are used to submit work to a cluster. A step can be specified using the shorthand syntax, by referencing a JSON file or by specifying an inline JSON structure. Args supplied with steps should be a comma-separated list of values (Args=arg1,arg2,arg3 ) or a bracket-enclosed list of values and key-value pairs (Args=[arg1,arg2=value,arg4 ).

(structure)

Type -> (string)

The type of a step to be added to the cluster.

Name -> (string)

The name of the step.

ActionOnFailure -> (string)

The action to take if the cluster step fails.

Jar -> (string)

A path to a JAR file run during the step.

Args -> (list)

A list of command line arguments to pass to the step.

(string)

MainClass -> (string)

The name of the main class in the specified Java file. If not specified, the JAR file should specify a Main-Class in its manifest file.

Properties -> (string)

A list of Java properties that are set when the step runs. You can use these properties to pass key value pairs to your main function.

Shorthand Syntax:

Type=string,Name=string,ActionOnFailure=string,Jar=string,Args=string,string,MainClass=string,Properties=string ...

JSON Syntax:

[
  {
    "Type": "CUSTOM_JAR"|"STREAMING"|"HIVE"|"PIG"|"IMPALA",
    "Name": "string",
    "ActionOnFailure": "TERMINATE_CLUSTER"|"CANCEL_AND_WAIT"|"CONTINUE",
    "Jar": "string",
    "Args": ["string", ...],
    "MainClass": "string",
    "Properties": "string"
  }
  ...
]

See ‘aws help’ for descriptions of global parameters.

Examples

1. To add Custom JAR steps to a cluster

  • Command:

    aws emr add-steps --cluster-id j-XXXXXXXX --steps Type=CUSTOM_JAR,Name=CustomJAR,ActionOnFailure=CONTINUE,Jar=s3://mybucket/mytest.jar,Args=arg1,arg2,arg3 Type=CUSTOM_JAR,Name=CustomJAR,ActionOnFailure=CONTINUE,Jar=s3://mybucket/mytest.jar,MainClass=mymainclass,Args=arg1,arg2,arg3
    
  • Required parameters:

    Jar
    
  • Optional parameters:

    Type, Name, ActionOnFailure, Args
    
  • Output:

    {
        "StepIds":[
            "s-XXXXXXXX",
            "s-YYYYYYYY"
        ]
    }
    

2. To add Streaming steps to a cluster

  • Command:

    aws emr add-steps --cluster-id j-XXXXXXXX --steps Type=STREAMING,Name='Streaming Program',ActionOnFailure=CONTINUE,Args=[-files,s3://elasticmapreduce/samples/wordcount/wordSplitter.py,-mapper,wordSplitter.py,-reducer,aggregate,-input,s3://elasticmapreduce/samples/wordcount/input,-output,s3://mybucket/wordcount/output]
    
  • Required parameters:

    Type, Args
    
  • Optional parameters:

    Name, ActionOnFailure
    
  • JSON equivalent (contents of step.json):

     [
      {
        "Name": "JSON Streaming Step",
        "Args": ["-files","s3://elasticmapreduce/samples/wordcount/wordSplitter.py","-mapper","wordSplitter.py","-reducer","aggregate","-input","s3://elasticmapreduce/samples/wordcount/input","-output","s3://mybucket/wordcount/output"],
        "ActionOnFailure": "CONTINUE",
        "Type": "STREAMING"
      }
    ]
    

NOTE: JSON arguments must include options and values as their own items in the list.

  • Command (using step.json):

    aws emr add-steps --cluster-id j-XXXXXXXX --steps file://./step.json
    
  • Output:

    {
        "StepIds":[
            "s-XXXXXXXX",
            "s-YYYYYYYY"
        ]
    }
    

3. To add a Streaming step with multiple files to a cluster (JSON only)

  • JSON (multiplefiles.json):

    [
      {
         "Name": "JSON Streaming Step",
         "Type": "STREAMING",
         "ActionOnFailure": "CONTINUE",
         "Args": [
             "-files",
             "s3://mybucket/mapper.py,s3://mybucket/reducer.py",
             "-mapper",
             "mapper.py",
             "-reducer",
             "reducer.py",
             "-input",
             "s3://mybucket/input",
             "-output",
             "s3://mybucket/output"]
      }
    ]
    
  • Command:

    aws emr add-steps --cluster-id j-XXXXXXXX  --steps file://./multiplefiles.json
    
  • Required parameters:

    Type, Args
    
  • Optional parameters:

    Name, ActionOnFailure
    
  • Output:

    {
        "StepIds":[
            "s-XXXXXXXX",
        ]
    }
    

4. To add Hive steps to a cluster

  • Command:

    aws emr add-steps --cluster-id j-XXXXXXXX --steps Type=HIVE,Name='Hive program',ActionOnFailure=CONTINUE,Args=[-f,s3://mybucket/myhivescript.q,-d,INPUT=s3://mybucket/myhiveinput,-d,OUTPUT=s3://mybucket/myhiveoutput,arg1,arg2] Type=HIVE,Name='Hive steps',ActionOnFailure=TERMINATE_CLUSTER,Args=[-f,s3://elasticmapreduce/samples/hive-ads/libs/model-build.q,-d,INPUT=s3://elasticmapreduce/samples/hive-ads/tables,-d,OUTPUT=s3://mybucket/hive-ads/output/2014-04-18/11-07-32,-d,LIBS=s3://elasticmapreduce/samples/hive-ads/libs]
    
  • Required parameters:

    Type, Args
    
  • Optional parameters:

    Name, ActionOnFailure
    
  • Output:

    {
        "StepIds":[
            "s-XXXXXXXX",
            "s-YYYYYYYY"
        ]
    }
    

5. To add Pig steps to a cluster

  • Command:

    aws emr add-steps --cluster-id j-XXXXXXXX --steps Type=PIG,Name='Pig program',ActionOnFailure=CONTINUE,Args=[-f,s3://mybucket/mypigscript.pig,-p,INPUT=s3://mybucket/mypiginput,-p,OUTPUT=s3://mybucket/mypigoutput,arg1,arg2] Type=PIG,Name='Pig program',Args=[-f,s3://elasticmapreduce/samples/pig-apache/do-reports2.pig,-p,INPUT=s3://elasticmapreduce/samples/pig-apache/input,-p,OUTPUT=s3://mybucket/pig-apache/output,arg1,arg2]
    
  • Required parameters:

    Type, Args
    
  • Optional parameters:

    Name, ActionOnFailure
    
  • Output:

    {
        "StepIds":[
            "s-XXXXXXXX",
            "s-YYYYYYYY"
        ]
    }
    

6. To add Impala steps to a cluster

  • Command:

    aws emr add-steps --cluster-id j-XXXXXXXX --steps Type=IMPALA,Name='Impala program',ActionOnFailure=CONTINUE,Args=--impala-script,s3://myimpala/input,--console-output-path,s3://myimpala/output
    
  • Required parameters:

    Type, Args
    
  • Optional parameters:

    Name, ActionOnFailure
    
  • Output:

    {
        "StepIds":[
            "s-XXXXXXXX",
            "s-YYYYYYYY"
        ]
    }