Creates a new DataBrew dataset.
See also: AWS API Documentation
See ‘aws help’ for descriptions of global parameters.
create-dataset
--name <value>
[--format <value>]
[--format-options <value>]
--input <value>
[--tags <value>]
[--cli-input-json | --cli-input-yaml]
[--generate-cli-skeleton <value>]
--name
(string)
The name of the dataset to be created. Valid characters are alphanumeric (A-Z, a-z, 0-9), hyphen (-), period (.), and space.
--format
(string)
Specifies the file format of a dataset created from an S3 file or folder.
Possible values:
CSV
JSON
PARQUET
EXCEL
--format-options
(structure)
Options that define the structure of either Csv, Excel, or JSON input.
Json -> (structure)
Options that define how JSON input is to be interpreted by DataBrew.
MultiLine -> (boolean)
A value that specifies whether JSON input contains embedded new line characters.
Excel -> (structure)
Options that define how Excel input is to be interpreted by DataBrew.
SheetNames -> (list)
Specifies one or more named sheets in the Excel file, which will be included in the dataset.
(string)
SheetIndexes -> (list)
Specifies one or more sheet numbers in the Excel file, which will be included in the dataset.
(integer)
HeaderRow -> (boolean)
A variable that specifies whether the first row in the file will be parsed as the header. If false, column names will be auto-generated.
Csv -> (structure)
Options that define how Csv input is to be interpreted by DataBrew.
Delimiter -> (string)
A single character that specifies the delimiter being used in the Csv file.
HeaderRow -> (boolean)
A variable that specifies whether the first row in the file will be parsed as the header. If false, column names will be auto-generated.
Shorthand Syntax:
Json={MultiLine=boolean},Excel={SheetNames=[string,string],SheetIndexes=[integer,integer],HeaderRow=boolean},Csv={Delimiter=string,HeaderRow=boolean}
JSON Syntax:
{
"Json": {
"MultiLine": true|false
},
"Excel": {
"SheetNames": ["string", ...],
"SheetIndexes": [integer, ...],
"HeaderRow": true|false
},
"Csv": {
"Delimiter": "string",
"HeaderRow": true|false
}
}
--input
(structure)
Information on how DataBrew can find data, in either the AWS Glue Data Catalog or Amazon S3.
S3InputDefinition -> (structure)
The Amazon S3 location where the data is stored.
Bucket -> (string)
The S3 bucket name.
Key -> (string)
The unique name of the object in the bucket.
DataCatalogInputDefinition -> (structure)
The AWS Glue Data Catalog parameters for the data.
CatalogId -> (string)
The unique identifier of the AWS account that holds the Data Catalog that stores the data.
DatabaseName -> (string)
The name of a database in the Data Catalog.
TableName -> (string)
The name of a database table in the Data Catalog. This table corresponds to a DataBrew dataset.
TempDirectory -> (structure)
An Amazon location that AWS Glue Data Catalog can use as a temporary directory.
Bucket -> (string)
The S3 bucket name.
Key -> (string)
The unique name of the object in the bucket.
Shorthand Syntax:
S3InputDefinition={Bucket=string,Key=string},DataCatalogInputDefinition={CatalogId=string,DatabaseName=string,TableName=string,TempDirectory={Bucket=string,Key=string}}
JSON Syntax:
{
"S3InputDefinition": {
"Bucket": "string",
"Key": "string"
},
"DataCatalogInputDefinition": {
"CatalogId": "string",
"DatabaseName": "string",
"TableName": "string",
"TempDirectory": {
"Bucket": "string",
"Key": "string"
}
}
}
--tags
(map)
Metadata tags to apply to this dataset.
key -> (string)
value -> (string)
Shorthand Syntax:
KeyName1=string,KeyName2=string
JSON Syntax:
{"string": "string"
...}
--cli-input-json
| --cli-input-yaml
(string)
Reads arguments from the JSON string provided. The JSON string follows the format provided by --generate-cli-skeleton
. If other arguments are provided on the command line, those values will override the JSON-provided values. It is not possible to pass arbitrary binary values using a JSON-provided value as the string will be taken literally. This may not be specified along with --cli-input-yaml
.
--generate-cli-skeleton
(string)
Prints a JSON skeleton to standard output without sending an API request. If provided with no value or the value input
, prints a sample input JSON that can be used as an argument for --cli-input-json
. Similarly, if provided yaml-input
it will print a sample input YAML that can be used with --cli-input-yaml
. If provided with the value output
, it validates the command inputs and returns a sample output JSON for that command.
See ‘aws help’ for descriptions of global parameters.