[ aws . glue ]

create-table

Description

Creates a new table definition in the Data Catalog.

See also: AWS API Documentation

See ‘aws help’ for descriptions of global parameters.

Synopsis

  create-table
[--catalog-id <value>]
--database-name <value>
--table-input <value>
[--partition-indexes <value>]
[--cli-input-json | --cli-input-yaml]
[--generate-cli-skeleton <value>]

Options

--catalog-id (string)

The ID of the Data Catalog in which to create the Table . If none is supplied, the Amazon Web Services account ID is used by default.

--database-name (string)

The catalog database in which to create the new table. For Hive compatibility, this name is entirely lowercase.

--table-input (structure)

The TableInput object that defines the metadata table to create in the catalog.

Name -> (string)

The table name. For Hive compatibility, this is folded to lowercase when it is stored.

Description -> (string)

A description of the table.

Owner -> (string)

The table owner.

LastAccessTime -> (timestamp)

The last time that the table was accessed.

LastAnalyzedTime -> (timestamp)

The last time that column statistics were computed for this table.

Retention -> (integer)

The retention time for this table.

StorageDescriptor -> (structure)

A storage descriptor containing information about the physical storage of this table.

Columns -> (list)

A list of the Columns in the table.

(structure)

A column in a Table .

Name -> (string)

The name of the Column .

Type -> (string)

The data type of the Column .

Comment -> (string)

A free-form text comment.

Parameters -> (map)

These key-value pairs define properties associated with the column.

key -> (string)

value -> (string)

Location -> (string)

The physical location of the table. By default, this takes the form of the warehouse location, followed by the database location in the warehouse, followed by the table name.

InputFormat -> (string)

The input format: SequenceFileInputFormat (binary), or TextInputFormat , or a custom format.

OutputFormat -> (string)

The output format: SequenceFileOutputFormat (binary), or IgnoreKeyTextOutputFormat , or a custom format.

Compressed -> (boolean)

True if the data in the table is compressed, or False if not.

NumberOfBuckets -> (integer)

Must be specified if the table contains any dimension columns.

SerdeInfo -> (structure)

The serialization/deserialization (SerDe) information.

Name -> (string)

Name of the SerDe.

SerializationLibrary -> (string)

Usually the class that implements the SerDe. An example is org.apache.hadoop.hive.serde2.columnar.ColumnarSerDe .

Parameters -> (map)

These key-value pairs define initialization parameters for the SerDe.

key -> (string)

value -> (string)

BucketColumns -> (list)

A list of reducer grouping columns, clustering columns, and bucketing columns in the table.

(string)

SortColumns -> (list)

A list specifying the sort order of each bucket in the table.

(structure)

Specifies the sort order of a sorted column.

Column -> (string)

The name of the column.

SortOrder -> (integer)

Indicates that the column is sorted in ascending order (== 1 ), or in descending order (==0 ).

Parameters -> (map)

The user-supplied properties in key-value form.

key -> (string)

value -> (string)

SkewedInfo -> (structure)

The information about values that appear frequently in a column (skewed values).

SkewedColumnNames -> (list)

A list of names of columns that contain skewed values.

(string)

SkewedColumnValues -> (list)

A list of values that appear so frequently as to be considered skewed.

(string)

SkewedColumnValueLocationMaps -> (map)

A mapping of skewed values to the columns that contain them.

key -> (string)

value -> (string)

StoredAsSubDirectories -> (boolean)

True if the table data is stored in subdirectories, or False if not.

SchemaReference -> (structure)

An object that references a schema stored in the Glue Schema Registry.

When creating a table, you can pass an empty list of columns for the schema, and instead use a schema reference.

SchemaId -> (structure)

A structure that contains schema identity fields. Either this or the SchemaVersionId has to be provided.

SchemaArn -> (string)

The Amazon Resource Name (ARN) of the schema. One of SchemaArn or SchemaName has to be provided.

SchemaName -> (string)

The name of the schema. One of SchemaArn or SchemaName has to be provided.

RegistryName -> (string)

The name of the schema registry that contains the schema.

SchemaVersionId -> (string)

The unique ID assigned to a version of the schema. Either this or the SchemaId has to be provided.

SchemaVersionNumber -> (long)

The version number of the schema.

PartitionKeys -> (list)

A list of columns by which the table is partitioned. Only primitive types are supported as partition keys.

When you create a table used by Amazon Athena, and you do not specify any partitionKeys , you must at least set the value of partitionKeys to an empty list. For example:

"PartitionKeys": []

(structure)

A column in a Table .

Name -> (string)

The name of the Column .

Type -> (string)

The data type of the Column .

Comment -> (string)

A free-form text comment.

Parameters -> (map)

These key-value pairs define properties associated with the column.

key -> (string)

value -> (string)

ViewOriginalText -> (string)

If the table is a view, the original text of the view; otherwise null .

ViewExpandedText -> (string)

If the table is a view, the expanded text of the view; otherwise null .

TableType -> (string)

The type of this table (EXTERNAL_TABLE , VIRTUAL_VIEW , etc.).

Parameters -> (map)

These key-value pairs define properties associated with the table.

key -> (string)

value -> (string)

TargetTable -> (structure)

A TableIdentifier structure that describes a target table for resource linking.

CatalogId -> (string)

The ID of the Data Catalog in which the table resides.

DatabaseName -> (string)

The name of the catalog database that contains the target table.

Name -> (string)

The name of the target table.

JSON Syntax:

{
  "Name": "string",
  "Description": "string",
  "Owner": "string",
  "LastAccessTime": timestamp,
  "LastAnalyzedTime": timestamp,
  "Retention": integer,
  "StorageDescriptor": {
    "Columns": [
      {
        "Name": "string",
        "Type": "string",
        "Comment": "string",
        "Parameters": {"string": "string"
          ...}
      }
      ...
    ],
    "Location": "string",
    "InputFormat": "string",
    "OutputFormat": "string",
    "Compressed": true|false,
    "NumberOfBuckets": integer,
    "SerdeInfo": {
      "Name": "string",
      "SerializationLibrary": "string",
      "Parameters": {"string": "string"
        ...}
    },
    "BucketColumns": ["string", ...],
    "SortColumns": [
      {
        "Column": "string",
        "SortOrder": integer
      }
      ...
    ],
    "Parameters": {"string": "string"
      ...},
    "SkewedInfo": {
      "SkewedColumnNames": ["string", ...],
      "SkewedColumnValues": ["string", ...],
      "SkewedColumnValueLocationMaps": {"string": "string"
        ...}
    },
    "StoredAsSubDirectories": true|false,
    "SchemaReference": {
      "SchemaId": {
        "SchemaArn": "string",
        "SchemaName": "string",
        "RegistryName": "string"
      },
      "SchemaVersionId": "string",
      "SchemaVersionNumber": long
    }
  },
  "PartitionKeys": [
    {
      "Name": "string",
      "Type": "string",
      "Comment": "string",
      "Parameters": {"string": "string"
        ...}
    }
    ...
  ],
  "ViewOriginalText": "string",
  "ViewExpandedText": "string",
  "TableType": "string",
  "Parameters": {"string": "string"
    ...},
  "TargetTable": {
    "CatalogId": "string",
    "DatabaseName": "string",
    "Name": "string"
  }
}

--partition-indexes (list)

A list of partition indexes, PartitionIndex structures, to create in the table.

(structure)

A structure for a partition index.

Keys -> (list)

The keys for the partition index.

(string)

IndexName -> (string)

The name of the partition index.

Shorthand Syntax:

Keys=string,string,IndexName=string ...

JSON Syntax:

[
  {
    "Keys": ["string", ...],
    "IndexName": "string"
  }
  ...
]

--cli-input-json | --cli-input-yaml (string) Reads arguments from the JSON string provided. The JSON string follows the format provided by --generate-cli-skeleton. If other arguments are provided on the command line, those values will override the JSON-provided values. It is not possible to pass arbitrary binary values using a JSON-provided value as the string will be taken literally. This may not be specified along with --cli-input-yaml.

--generate-cli-skeleton (string) Prints a JSON skeleton to standard output without sending an API request. If provided with no value or the value input, prints a sample input JSON that can be used as an argument for --cli-input-json. Similarly, if provided yaml-input it will print a sample input YAML that can be used with --cli-input-yaml. If provided with the value output, it validates the command inputs and returns a sample output JSON for that command.

See ‘aws help’ for descriptions of global parameters.

Examples

Example 1: To create a table for a Kinesis data stream

The following create-table example creates a table in the AWS Glue Data Catalog that describes a Kinesis data stream.

aws glue create-table \
    --database-name tempdb \
    --table-input  '{"Name":"test-kinesis-input", "StorageDescriptor":{ \
        "Columns":[ \
            {"Name":"sensorid", "Type":"int"}, \
            {"Name":"currenttemperature", "Type":"int"}, \
            {"Name":"status", "Type":"string"}
        ], \
        "Location":"my-testing-stream", \
        "Parameters":{ \
            "typeOfData":"kinesis","streamName":"my-testing-stream", \
            "kinesisUrl":"https://kinesis.us-east-1.amazonaws.com" \
        }, \
        "SerdeInfo":{ \
            "SerializationLibrary":"org.openx.data.jsonserde.JsonSerDe"} \
        }, \
        "Parameters":{ \
            "classification":"json"} \
        }' \
    --profile my-profile \
    --endpoint https://glue.us-east-1.amazonaws.com

This command produces no output.

For more information, see Defining Tables in the AWS Glue Data Catalog in the AWS Glue Developer Guide.

Example 2: To create a table for a Kafka data store

The following create-table example creates a table in the AWS Glue Data Catalog that describes a Kafka data store.

aws glue create-table \
    --database-name tempdb \
    --table-input  '{"Name":"test-kafka-input", "StorageDescriptor":{ \
        "Columns":[ \
            {"Name":"sensorid", "Type":"int"}, \
            {"Name":"currenttemperature", "Type":"int"}, \
            {"Name":"status", "Type":"string"}
        ], \
        "Location":"glue-topic", \
        "Parameters":{ \
            "typeOfData":"kafka","topicName":"glue-topic", \
            "connectionName":"my-kafka-connection"
        }, \
        "SerdeInfo":{ \
            "SerializationLibrary":"org.apache.hadoop.hive.serde2.OpenCSVSerde"} \
        }, \
        "Parameters":{ \
            "separatorChar":","} \
        }' \
    --profile my-profile \
    --endpoint https://glue.us-east-1.amazonaws.com

This command produces no output.

For more information, see Defining Tables in the AWS Glue Data Catalog in the AWS Glue Developer Guide.

Example 3: To create a table for a AWS S3 data store

The following create-table example creates a table in the AWS Glue Data Catalog that describes a AWS Simple Storage Service (AWS S3) data store.

aws glue create-table \
    --database-name tempdb \
    --table-input  '{"Name":"s3-output", "StorageDescriptor":{ \
        "Columns":[ \
            {"Name":"s1", "Type":"string"}, \
            {"Name":"s2", "Type":"int"}, \
            {"Name":"s3", "Type":"string"}
        ], \
        "Location":"s3://bucket-path/"}, \
        "SerdeInfo":{ \
            "SerializationLibrary":"org.openx.data.jsonserde.JsonSerDe"}, \
        "Parameters":{ \
            "classification":"json"} \
        }' \
    --profile my-profile \
    --endpoint https://glue.us-east-1.amazonaws.com

This command produces no output.

For more information, see Defining Tables in the AWS Glue Data Catalog in the AWS Glue Developer Guide.

Output

None