[ aws . iotanalytics ]

create-dataset

Description

Creates a data set. A data set stores data retrieved from a data store by applying a “queryAction” (a SQL query) or a “containerAction” (executing a containerized application). This operation creates the skeleton of a data set. The data set can be populated manually by calling “CreateDatasetContent” or automatically according to a “trigger” you specify.

See also: AWS API Documentation

See ‘aws help’ for descriptions of global parameters.

Synopsis

  create-dataset
--dataset-name <value>
--actions <value>
[--triggers <value>]
[--content-delivery-rules <value>]
[--retention-period <value>]
[--versioning-configuration <value>]
[--tags <value>]
[--cli-input-json | --cli-input-yaml]
[--generate-cli-skeleton <value>]
[--cli-auto-prompt <value>]

Options

--dataset-name (string)

The name of the data set.

--actions (list)

A list of actions that create the data set contents.

(structure)

A “DatasetAction” object that specifies how data set contents are automatically created.

actionName -> (string)

The name of the data set action by which data set contents are automatically created.

queryAction -> (structure)

An “SqlQueryDatasetAction” object that uses an SQL query to automatically create data set contents.

sqlQuery -> (string)

A SQL query string.

filters -> (list)

Pre-filters applied to message data.

(structure)

Information which is used to filter message data, to segregate it according to the time frame in which it arrives.

deltaTime -> (structure)

Used to limit data to that which has arrived since the last execution of the action.

offsetSeconds -> (integer)

The number of seconds of estimated “in flight” lag time of message data. When you create data set contents using message data from a specified time frame, some message data may still be “in flight” when processing begins, and so will not arrive in time to be processed. Use this field to make allowances for the “in flight” time of your message data, so that data not processed from a previous time frame will be included with the next time frame. Without this, missed message data would be excluded from processing during the next time frame as well, because its timestamp places it within the previous time frame.

timeExpression -> (string)

An expression by which the time of the message data may be determined. This may be the name of a timestamp field, or a SQL expression which is used to derive the time the message data was generated.

containerAction -> (structure)

Information which allows the system to run a containerized application in order to create the data set contents. The application must be in a Docker container along with any needed support libraries.

image -> (string)

The ARN of the Docker container stored in your account. The Docker container contains an application and needed support libraries and is used to generate data set contents.

executionRoleArn -> (string)

The ARN of the role which gives permission to the system to access needed resources in order to run the “containerAction”. This includes, at minimum, permission to retrieve the data set contents which are the input to the containerized application.

resourceConfiguration -> (structure)

Configuration of the resource which executes the “containerAction”.

computeType -> (string)

The type of the compute resource used to execute the “containerAction”. Possible values are: ACU_1 (vCPU=4, memory=16GiB) or ACU_2 (vCPU=8, memory=32GiB).

volumeSizeInGB -> (integer)

The size (in GB) of the persistent storage available to the resource instance used to execute the “containerAction” (min: 1, max: 50).

variables -> (list)

The values of variables used within the context of the execution of the containerized application (basically, parameters passed to the application). Each variable must have a name and a value given by one of “stringValue”, “datasetContentVersionValue”, or “outputFileUriValue”.

(structure)

An instance of a variable to be passed to the “containerAction” execution. Each variable must have a name and a value given by one of “stringValue”, “datasetContentVersionValue”, or “outputFileUriValue”.

name -> (string)

The name of the variable.

stringValue -> (string)

The value of the variable as a string.

doubleValue -> (double)

The value of the variable as a double (numeric).

datasetContentVersionValue -> (structure)

The value of the variable as a structure that specifies a data set content version.

datasetName -> (string)

The name of the data set whose latest contents are used as input to the notebook or application.

outputFileUriValue -> (structure)

The value of the variable as a structure that specifies an output file URI.

fileName -> (string)

The URI of the location where data set contents are stored, usually the URI of a file in an S3 bucket.

JSON Syntax:

[
  {
    "actionName": "string",
    "queryAction": {
      "sqlQuery": "string",
      "filters": [
        {
          "deltaTime": {
            "offsetSeconds": integer,
            "timeExpression": "string"
          }
        }
        ...
      ]
    },
    "containerAction": {
      "image": "string",
      "executionRoleArn": "string",
      "resourceConfiguration": {
        "computeType": "ACU_1"|"ACU_2",
        "volumeSizeInGB": integer
      },
      "variables": [
        {
          "name": "string",
          "stringValue": "string",
          "doubleValue": double,
          "datasetContentVersionValue": {
            "datasetName": "string"
          },
          "outputFileUriValue": {
            "fileName": "string"
          }
        }
        ...
      ]
    }
  }
  ...
]

--triggers (list)

A list of triggers. A trigger causes data set contents to be populated at a specified time interval or when another data set’s contents are created. The list of triggers can be empty or contain up to five DataSetTrigger objects.

(structure)

The “DatasetTrigger” that specifies when the data set is automatically updated.

schedule -> (structure)

The “Schedule” when the trigger is initiated.

expression -> (string)

The expression that defines when to trigger an update. For more information, see Schedule Expressions for Rules in the Amazon CloudWatch Events User Guide.

dataset -> (structure)

The data set whose content creation triggers the creation of this data set’s contents.

name -> (string)

The name of the data set whose content generation triggers the new data set content generation.

Shorthand Syntax:

schedule={expression=string},dataset={name=string} ...

JSON Syntax:

[
  {
    "schedule": {
      "expression": "string"
    },
    "dataset": {
      "name": "string"
    }
  }
  ...
]

--content-delivery-rules (list)

When data set contents are created they are delivered to destinations specified here.

(structure)

When data set contents are created they are delivered to destination specified here.

entryName -> (string)

The name of the data set content delivery rules entry.

destination -> (structure)

The destination to which data set contents are delivered.

iotEventsDestinationConfiguration -> (structure)

Configuration information for delivery of data set contents to AWS IoT Events.

inputName -> (string)

The name of the AWS IoT Events input to which data set contents are delivered.

roleArn -> (string)

The ARN of the role which grants AWS IoT Analytics permission to deliver data set contents to an AWS IoT Events input.

s3DestinationConfiguration -> (structure)

Configuration information for delivery of data set contents to Amazon S3.

bucket -> (string)

The name of the Amazon S3 bucket to which data set contents are delivered.

key -> (string)

The key of the data set contents object. Each object in an Amazon S3 bucket has a key that is its unique identifier within the bucket (each object in a bucket has exactly one key). To produce a unique key, you can use “!{iotanalytics:scheduledTime}” to insert the time of the scheduled SQL query run, or “!{iotanalytics:versioned} to insert a unique hash identifying the data set, for example: “/DataSet/!{iotanalytics:scheduledTime}/!{iotanalytics:versioned}.csv”.

glueConfiguration -> (structure)

Configuration information for coordination with the AWS Glue ETL (extract, transform and load) service.

tableName -> (string)

The name of the table in your AWS Glue Data Catalog which is used to perform the ETL (extract, transform and load) operations. (An AWS Glue Data Catalog table contains partitioned data and descriptions of data sources and targets.)

databaseName -> (string)

The name of the database in your AWS Glue Data Catalog in which the table is located. (An AWS Glue Data Catalog database contains Glue Data tables.)

roleArn -> (string)

The ARN of the role which grants AWS IoT Analytics permission to interact with your Amazon S3 and AWS Glue resources.

JSON Syntax:

[
  {
    "entryName": "string",
    "destination": {
      "iotEventsDestinationConfiguration": {
        "inputName": "string",
        "roleArn": "string"
      },
      "s3DestinationConfiguration": {
        "bucket": "string",
        "key": "string",
        "glueConfiguration": {
          "tableName": "string",
          "databaseName": "string"
        },
        "roleArn": "string"
      }
    }
  }
  ...
]

--retention-period (structure)

[Optional] How long, in days, versions of data set contents are kept for the data set. If not specified or set to null, versions of data set contents are retained for at most 90 days. The number of versions of data set contents retained is determined by the versioningConfiguration parameter. (For more information, see https://docs.aws.amazon.com/iotanalytics/latest/userguide/getting-started.html#aws-iot-analytics-dataset-versions)

unlimited -> (boolean)

If true, message data is kept indefinitely.

numberOfDays -> (integer)

The number of days that message data is kept. The “unlimited” parameter must be false.

Shorthand Syntax:

unlimited=boolean,numberOfDays=integer

JSON Syntax:

{
  "unlimited": true|false,
  "numberOfDays": integer
}

--versioning-configuration (structure)

[Optional] How many versions of data set contents are kept. If not specified or set to null, only the latest version plus the latest succeeded version (if they are different) are kept for the time period specified by the “retentionPeriod” parameter. (For more information, see https://docs.aws.amazon.com/iotanalytics/latest/userguide/getting-started.html#aws-iot-analytics-dataset-versions)

unlimited -> (boolean)

If true, unlimited versions of data set contents will be kept.

maxVersions -> (integer)

How many versions of data set contents will be kept. The “unlimited” parameter must be false.

Shorthand Syntax:

unlimited=boolean,maxVersions=integer

JSON Syntax:

{
  "unlimited": true|false,
  "maxVersions": integer
}

--tags (list)

Metadata which can be used to manage the data set.

(structure)

A set of key/value pairs which are used to manage the resource.

key -> (string)

The tag’s key.

value -> (string)

The tag’s value.

Shorthand Syntax:

key=string,value=string ...

JSON Syntax:

[
  {
    "key": "string",
    "value": "string"
  }
  ...
]

--cli-input-json | --cli-input-yaml (string) Reads arguments from the JSON string provided. The JSON string follows the format provided by --generate-cli-skeleton. If other arguments are provided on the command line, those values will override the JSON-provided values. It is not possible to pass arbitrary binary values using a JSON-provided value as the string will be taken literally. This may not be specified along with --cli-input-yaml.

--generate-cli-skeleton (string) Prints a JSON skeleton to standard output without sending an API request. If provided with no value or the value input, prints a sample input JSON that can be used as an argument for --cli-input-json. Similarly, if provided yaml-input it will print a sample input YAML that can be used with --cli-input-yaml. If provided with the value output, it validates the command inputs and returns a sample output JSON for that command.

--cli-auto-prompt (boolean) Automatically prompt for CLI input parameters.

See ‘aws help’ for descriptions of global parameters.

Examples

To create a dataset

The following create-dataset example creates a dataset. A dataset stores data retrieved from a data store by applying a queryAction (a SQL query) or a containerAction (executing a containerized application). This operation creates the skeleton of a dataset. You can populate the dataset manually by calling CreateDatasetContent or automatically according to a trigger you specify.

aws iotanalytics create-dataset \
    --cli-input-json file://create-dataset.json

Contents of create-dataset.json:

{
    "datasetName": "mydataset",
    "actions": [
        {
            "actionName": "myDatasetAction",
            "queryAction": {
                "sqlQuery": "SELECT * FROM mydatastore"
            }
        }
    ],
    "retentionPeriod": {
        "unlimited": true
    },
    "tags": [
        {
            "key": "Environment",
            "value": "Production"
        }
    ]
}

Output:

{
    "datasetName": "mydataset",
    "retentionPeriod": {
        "unlimited": true
    },
    "datasetArn": "arn:aws:iotanalytics:us-west-2:123456789012:dataset/mydataset"
}

For more information, see CreateDataset in the AWS IoT Analytics API Reference.

Output

datasetName -> (string)

The name of the data set.

datasetArn -> (string)

The ARN of the data set.

retentionPeriod -> (structure)

How long, in days, data set contents are kept for the data set.

unlimited -> (boolean)

If true, message data is kept indefinitely.

numberOfDays -> (integer)

The number of days that message data is kept. The “unlimited” parameter must be false.