[ aws . databrew ]

create-dataset

Description

Creates a new DataBrew dataset.

See also: AWS API Documentation

See ‘aws help’ for descriptions of global parameters.

Synopsis

  create-dataset
--name <value>
[--format <value>]
[--format-options <value>]
--input <value>
[--tags <value>]
[--cli-input-json | --cli-input-yaml]
[--generate-cli-skeleton <value>]

Options

--name (string)

The name of the dataset to be created. Valid characters are alphanumeric (A-Z, a-z, 0-9), hyphen (-), period (.), and space.

--format (string)

Specifies the file format of a dataset created from an S3 file or folder.

Possible values:

  • CSV

  • JSON

  • PARQUET

  • EXCEL

--format-options (structure)

Options that define the structure of either Csv, Excel, or JSON input.

Json -> (structure)

Options that define how JSON input is to be interpreted by DataBrew.

MultiLine -> (boolean)

A value that specifies whether JSON input contains embedded new line characters.

Excel -> (structure)

Options that define how Excel input is to be interpreted by DataBrew.

SheetNames -> (list)

Specifies one or more named sheets in the Excel file, which will be included in the dataset.

(string)

SheetIndexes -> (list)

Specifies one or more sheet numbers in the Excel file, which will be included in the dataset.

(integer)

HeaderRow -> (boolean)

A variable that specifies whether the first row in the file will be parsed as the header. If false, column names will be auto-generated.

Csv -> (structure)

Options that define how Csv input is to be interpreted by DataBrew.

Delimiter -> (string)

A single character that specifies the delimiter being used in the Csv file.

HeaderRow -> (boolean)

A variable that specifies whether the first row in the file will be parsed as the header. If false, column names will be auto-generated.

Shorthand Syntax:

Json={MultiLine=boolean},Excel={SheetNames=[string,string],SheetIndexes=[integer,integer],HeaderRow=boolean},Csv={Delimiter=string,HeaderRow=boolean}

JSON Syntax:

{
  "Json": {
    "MultiLine": true|false
  },
  "Excel": {
    "SheetNames": ["string", ...],
    "SheetIndexes": [integer, ...],
    "HeaderRow": true|false
  },
  "Csv": {
    "Delimiter": "string",
    "HeaderRow": true|false
  }
}

--input (structure)

Information on how DataBrew can find data, in either the AWS Glue Data Catalog or Amazon S3.

S3InputDefinition -> (structure)

The Amazon S3 location where the data is stored.

Bucket -> (string)

The S3 bucket name.

Key -> (string)

The unique name of the object in the bucket.

DataCatalogInputDefinition -> (structure)

The AWS Glue Data Catalog parameters for the data.

CatalogId -> (string)

The unique identifier of the AWS account that holds the Data Catalog that stores the data.

DatabaseName -> (string)

The name of a database in the Data Catalog.

TableName -> (string)

The name of a database table in the Data Catalog. This table corresponds to a DataBrew dataset.

TempDirectory -> (structure)

An Amazon location that AWS Glue Data Catalog can use as a temporary directory.

Bucket -> (string)

The S3 bucket name.

Key -> (string)

The unique name of the object in the bucket.

Shorthand Syntax:

S3InputDefinition={Bucket=string,Key=string},DataCatalogInputDefinition={CatalogId=string,DatabaseName=string,TableName=string,TempDirectory={Bucket=string,Key=string}}

JSON Syntax:

{
  "S3InputDefinition": {
    "Bucket": "string",
    "Key": "string"
  },
  "DataCatalogInputDefinition": {
    "CatalogId": "string",
    "DatabaseName": "string",
    "TableName": "string",
    "TempDirectory": {
      "Bucket": "string",
      "Key": "string"
    }
  }
}

--tags (map)

Metadata tags to apply to this dataset.

key -> (string)

value -> (string)

Shorthand Syntax:

KeyName1=string,KeyName2=string

JSON Syntax:

{"string": "string"
  ...}

--cli-input-json | --cli-input-yaml (string) Reads arguments from the JSON string provided. The JSON string follows the format provided by --generate-cli-skeleton. If other arguments are provided on the command line, those values will override the JSON-provided values. It is not possible to pass arbitrary binary values using a JSON-provided value as the string will be taken literally. This may not be specified along with --cli-input-yaml.

--generate-cli-skeleton (string) Prints a JSON skeleton to standard output without sending an API request. If provided with no value or the value input, prints a sample input JSON that can be used as an argument for --cli-input-json. Similarly, if provided yaml-input it will print a sample input YAML that can be used with --cli-input-yaml. If provided with the value output, it validates the command inputs and returns a sample output JSON for that command.

See ‘aws help’ for descriptions of global parameters.

Output

Name -> (string)

The name of the dataset that you created.