create-inference-recommendations-job¶

Description¶

Starts a recommendation job. You can create either an instance recommendation or load test job.

Synopsis¶

  create-inference-recommendations-job
--job-name <value>
--job-type <value>
--role-arn <value>
--input-config <value>
[--job-description <value>]
[--stopping-conditions <value>]
[--tags <value>]
[--cli-input-json | --cli-input-yaml]
[--generate-cli-skeleton <value>]

Options¶

--job-name (string)

A name for the recommendation job. The name must be unique within the Amazon Web Services Region and within your Amazon Web Services account.

--job-type (string)

Defines the type of recommendation job. Specify Default to initiate an instance recommendation and Advanced to initiate a load test. If left unspecified, Amazon SageMaker Inference Recommender will run an instance recommendation (DEFAULT ) job.

Possible values:

Default

Advanced

--role-arn (string)

The Amazon Resource Name (ARN) of an IAM role that enables Amazon SageMaker to perform tasks on your behalf.

--input-config (structure)

Provides information about the versioned model package Amazon Resource Name (ARN), the traffic pattern, and endpoint configurations.

ModelPackageVersionArn -> (string)

The Amazon Resource Name (ARN) of a versioned model package.

JobDurationInSeconds -> (integer)

Specifies the maximum duration of the job, in seconds.>

TrafficPattern -> (structure)

Specifies the traffic pattern of the job.

TrafficType -> (string)

Defines the traffic patterns.

Phases -> (list)

Defines the phases traffic specification.

(structure)

Defines the traffic pattern.

InitialNumberOfUsers -> (integer)

Specifies how many concurrent users to start with.

SpawnRate -> (integer)

Specified how many new users to spawn in a minute.

DurationInSeconds -> (integer)

Specifies how long traffic phase should be.

ResourceLimit -> (structure)

Defines the resource limit of the job.

MaxNumberOfTests -> (integer)

Defines the maximum number of load tests.

MaxParallelOfTests -> (integer)

Defines the maximum number of parallel load tests.

EndpointConfigurations -> (list)

Specifies the endpoint configuration to use for a job.

(structure)

The endpoint configuration for the load test.

InstanceType -> (string)

The instance types to use for the load test.

InferenceSpecificationName -> (string)

The inference specification name in the model package version.

EnvironmentParameterRanges -> (structure)

The parameter you want to benchmark against.

CategoricalParameterRanges -> (list)

Specified a list of parameters for each category.

(structure)

Environment parameters you want to benchmark your load test against.

Name -> (string)

The Name of the environment variable.

Value -> (list)

The list of values you can pass.

(string)

JSON Syntax:

{
  "ModelPackageVersionArn": "string",
  "JobDurationInSeconds": integer,
  "TrafficPattern": {
    "TrafficType": "PHASES",
    "Phases": [
      {
        "InitialNumberOfUsers": integer,
        "SpawnRate": integer,
        "DurationInSeconds": integer
      }
      ...
    ]
  },
  "ResourceLimit": {
    "MaxNumberOfTests": integer,
    "MaxParallelOfTests": integer
  },
  "EndpointConfigurations": [
    {
      "InstanceType": "ml.t2.medium"|"ml.t2.large"|"ml.t2.xlarge"|"ml.t2.2xlarge"|"ml.m4.xlarge"|"ml.m4.2xlarge"|"ml.m4.4xlarge"|"ml.m4.10xlarge"|"ml.m4.16xlarge"|"ml.m5.large"|"ml.m5.xlarge"|"ml.m5.2xlarge"|"ml.m5.4xlarge"|"ml.m5.12xlarge"|"ml.m5.24xlarge"|"ml.m5d.large"|"ml.m5d.xlarge"|"ml.m5d.2xlarge"|"ml.m5d.4xlarge"|"ml.m5d.12xlarge"|"ml.m5d.24xlarge"|"ml.c4.large"|"ml.c4.xlarge"|"ml.c4.2xlarge"|"ml.c4.4xlarge"|"ml.c4.8xlarge"|"ml.p2.xlarge"|"ml.p2.8xlarge"|"ml.p2.16xlarge"|"ml.p3.2xlarge"|"ml.p3.8xlarge"|"ml.p3.16xlarge"|"ml.c5.large"|"ml.c5.xlarge"|"ml.c5.2xlarge"|"ml.c5.4xlarge"|"ml.c5.9xlarge"|"ml.c5.18xlarge"|"ml.c5d.large"|"ml.c5d.xlarge"|"ml.c5d.2xlarge"|"ml.c5d.4xlarge"|"ml.c5d.9xlarge"|"ml.c5d.18xlarge"|"ml.g4dn.xlarge"|"ml.g4dn.2xlarge"|"ml.g4dn.4xlarge"|"ml.g4dn.8xlarge"|"ml.g4dn.12xlarge"|"ml.g4dn.16xlarge"|"ml.r5.large"|"ml.r5.xlarge"|"ml.r5.2xlarge"|"ml.r5.4xlarge"|"ml.r5.12xlarge"|"ml.r5.24xlarge"|"ml.r5d.large"|"ml.r5d.xlarge"|"ml.r5d.2xlarge"|"ml.r5d.4xlarge"|"ml.r5d.12xlarge"|"ml.r5d.24xlarge"|"ml.inf1.xlarge"|"ml.inf1.2xlarge"|"ml.inf1.6xlarge"|"ml.inf1.24xlarge",
      "InferenceSpecificationName": "string",
      "EnvironmentParameterRanges": {
        "CategoricalParameterRanges": [
          {
            "Name": "string",
            "Value": ["string", ...]
          }
          ...
        ]
      }
    }
    ...
  ]
}

--job-description (string)

Description of the recommendation job.

--stopping-conditions (structure)

A set of conditions for stopping a recommendation job. If any of the conditions are met, the job is automatically stopped.

MaxInvocations -> (integer)

The maximum number of requests per minute expected for the endpoint.

ModelLatencyThresholds -> (list)

The interval of time taken by a model to respond as viewed from SageMaker. The interval includes the local communication time taken to send the request and to fetch the response from the container of a model and the time taken to complete the inference in the container.

(structure)

The model latency threshold.

Percentile -> (string)

The model latency percentile threshold.

ValueInMilliseconds -> (integer)

The model latency percentile value in milliseconds.

Shorthand Syntax:

MaxInvocations=integer,ModelLatencyThresholds=[{Percentile=string,ValueInMilliseconds=integer},{Percentile=string,ValueInMilliseconds=integer}]

JSON Syntax:

{
  "MaxInvocations": integer,
  "ModelLatencyThresholds": [
    {
      "Percentile": "string",
      "ValueInMilliseconds": integer
    }
    ...
  ]
}

--tags (list)

The metadata that you apply to Amazon Web Services resources to help you categorize and organize them. Each tag consists of a key and a value, both of which you define. For more information, see Tagging Amazon Web Services Resources in the Amazon Web Services General Reference.

(structure)

A tag object that consists of a key and an optional value, used to manage metadata for SageMaker Amazon Web Services resources.

You can add tags to notebook instances, training jobs, hyperparameter tuning jobs, batch transform jobs, models, labeling jobs, work teams, endpoint configurations, and endpoints. For more information on adding tags to SageMaker resources, see AddTags .

For more information on adding metadata to your Amazon Web Services resources with tagging, see Tagging Amazon Web Services resources . For advice on best practices for managing Amazon Web Services resources with tagging, see Tagging Best Practices: Implement an Effective Amazon Web Services Resource Tagging Strategy .

Key -> (string)

The tag key. Tag keys must be unique per resource.

Value -> (string)

The tag value.

Shorthand Syntax:

Key=string,Value=string ...

JSON Syntax:

[
  {
    "Key": "string",
    "Value": "string"
  }
  ...
]

--cli-input-json | --cli-input-yaml (string) Reads arguments from the JSON string provided. The JSON string follows the format provided by --generate-cli-skeleton. If other arguments are provided on the command line, those values will override the JSON-provided values. It is not possible to pass arbitrary binary values using a JSON-provided value as the string will be taken literally. This may not be specified along with --cli-input-yaml.

--generate-cli-skeleton (string) Prints a JSON skeleton to standard output without sending an API request. If provided with no value or the value input, prints a sample input JSON that can be used as an argument for --cli-input-json. Similarly, if provided yaml-input it will print a sample input YAML that can be used with --cli-input-yaml. If provided with the value output, it validates the command inputs and returns a sample output JSON for that command.

See ‘aws help’ for descriptions of global parameters.

Output¶

JobArn -> (string)

The Amazon Resource Name (ARN) of the recommendation job.

Table of Contents

Feedback

User Guide

create-inference-recommendations-job¶

Description¶

Synopsis¶

Options¶

Output¶