Creates a new table definition in the Data Catalog.
See also: AWS API Documentation
See ‘aws help’ for descriptions of global parameters.
create-table
[--catalog-id <value>]
--database-name <value>
--table-input <value>
[--partition-indexes <value>]
[--cli-input-json | --cli-input-yaml]
[--generate-cli-skeleton <value>]
--catalog-id
(string)
The ID of the Data Catalog in which to create the
Table
. If none is supplied, the Amazon Web Services account ID is used by default.
--database-name
(string)
The catalog database in which to create the new table. For Hive compatibility, this name is entirely lowercase.
--table-input
(structure)
The
TableInput
object that defines the metadata table to create in the catalog.Name -> (string)
The table name. For Hive compatibility, this is folded to lowercase when it is stored.
Description -> (string)
A description of the table.
Owner -> (string)
The table owner.
LastAccessTime -> (timestamp)
The last time that the table was accessed.
LastAnalyzedTime -> (timestamp)
The last time that column statistics were computed for this table.
Retention -> (integer)
The retention time for this table.
StorageDescriptor -> (structure)
A storage descriptor containing information about the physical storage of this table.
Columns -> (list)
A list of the
Columns
in the table.(structure)
A column in a
Table
.Name -> (string)
The name of the
Column
.Type -> (string)
The data type of the
Column
.Comment -> (string)
A free-form text comment.
Parameters -> (map)
These key-value pairs define properties associated with the column.
key -> (string)
value -> (string)
Location -> (string)
The physical location of the table. By default, this takes the form of the warehouse location, followed by the database location in the warehouse, followed by the table name.
InputFormat -> (string)
The input format:
SequenceFileInputFormat
(binary), orTextInputFormat
, or a custom format.OutputFormat -> (string)
The output format:
SequenceFileOutputFormat
(binary), orIgnoreKeyTextOutputFormat
, or a custom format.Compressed -> (boolean)
True
if the data in the table is compressed, orFalse
if not.NumberOfBuckets -> (integer)
Must be specified if the table contains any dimension columns.
SerdeInfo -> (structure)
The serialization/deserialization (SerDe) information.
Name -> (string)
Name of the SerDe.
SerializationLibrary -> (string)
Usually the class that implements the SerDe. An example is
org.apache.hadoop.hive.serde2.columnar.ColumnarSerDe
.Parameters -> (map)
These key-value pairs define initialization parameters for the SerDe.
key -> (string)
value -> (string)
BucketColumns -> (list)
A list of reducer grouping columns, clustering columns, and bucketing columns in the table.
(string)
SortColumns -> (list)
A list specifying the sort order of each bucket in the table.
(structure)
Specifies the sort order of a sorted column.
Column -> (string)
The name of the column.
SortOrder -> (integer)
Indicates that the column is sorted in ascending order (
== 1
), or in descending order (==0
).Parameters -> (map)
The user-supplied properties in key-value form.
key -> (string)
value -> (string)
SkewedInfo -> (structure)
The information about values that appear frequently in a column (skewed values).
SkewedColumnNames -> (list)
A list of names of columns that contain skewed values.
(string)
SkewedColumnValues -> (list)
A list of values that appear so frequently as to be considered skewed.
(string)
SkewedColumnValueLocationMaps -> (map)
A mapping of skewed values to the columns that contain them.
key -> (string)
value -> (string)
StoredAsSubDirectories -> (boolean)
True
if the table data is stored in subdirectories, orFalse
if not.SchemaReference -> (structure)
An object that references a schema stored in the Glue Schema Registry.
When creating a table, you can pass an empty list of columns for the schema, and instead use a schema reference.
SchemaId -> (structure)
A structure that contains schema identity fields. Either this or the
SchemaVersionId
has to be provided.SchemaArn -> (string)
The Amazon Resource Name (ARN) of the schema. One of
SchemaArn
orSchemaName
has to be provided.SchemaName -> (string)
The name of the schema. One of
SchemaArn
orSchemaName
has to be provided.RegistryName -> (string)
The name of the schema registry that contains the schema.
SchemaVersionId -> (string)
The unique ID assigned to a version of the schema. Either this or the
SchemaId
has to be provided.SchemaVersionNumber -> (long)
The version number of the schema.
PartitionKeys -> (list)
A list of columns by which the table is partitioned. Only primitive types are supported as partition keys.
When you create a table used by Amazon Athena, and you do not specify any
partitionKeys
, you must at least set the value ofpartitionKeys
to an empty list. For example:
"PartitionKeys": []
(structure)
A column in a
Table
.Name -> (string)
The name of the
Column
.Type -> (string)
The data type of the
Column
.Comment -> (string)
A free-form text comment.
Parameters -> (map)
These key-value pairs define properties associated with the column.
key -> (string)
value -> (string)
ViewOriginalText -> (string)
If the table is a view, the original text of the view; otherwise
null
.ViewExpandedText -> (string)
If the table is a view, the expanded text of the view; otherwise
null
.TableType -> (string)
The type of this table (
EXTERNAL_TABLE
,VIRTUAL_VIEW
, etc.).Parameters -> (map)
These key-value pairs define properties associated with the table.
key -> (string)
value -> (string)
TargetTable -> (structure)
A
TableIdentifier
structure that describes a target table for resource linking.CatalogId -> (string)
The ID of the Data Catalog in which the table resides.
DatabaseName -> (string)
The name of the catalog database that contains the target table.
Name -> (string)
The name of the target table.
JSON Syntax:
{
"Name": "string",
"Description": "string",
"Owner": "string",
"LastAccessTime": timestamp,
"LastAnalyzedTime": timestamp,
"Retention": integer,
"StorageDescriptor": {
"Columns": [
{
"Name": "string",
"Type": "string",
"Comment": "string",
"Parameters": {"string": "string"
...}
}
...
],
"Location": "string",
"InputFormat": "string",
"OutputFormat": "string",
"Compressed": true|false,
"NumberOfBuckets": integer,
"SerdeInfo": {
"Name": "string",
"SerializationLibrary": "string",
"Parameters": {"string": "string"
...}
},
"BucketColumns": ["string", ...],
"SortColumns": [
{
"Column": "string",
"SortOrder": integer
}
...
],
"Parameters": {"string": "string"
...},
"SkewedInfo": {
"SkewedColumnNames": ["string", ...],
"SkewedColumnValues": ["string", ...],
"SkewedColumnValueLocationMaps": {"string": "string"
...}
},
"StoredAsSubDirectories": true|false,
"SchemaReference": {
"SchemaId": {
"SchemaArn": "string",
"SchemaName": "string",
"RegistryName": "string"
},
"SchemaVersionId": "string",
"SchemaVersionNumber": long
}
},
"PartitionKeys": [
{
"Name": "string",
"Type": "string",
"Comment": "string",
"Parameters": {"string": "string"
...}
}
...
],
"ViewOriginalText": "string",
"ViewExpandedText": "string",
"TableType": "string",
"Parameters": {"string": "string"
...},
"TargetTable": {
"CatalogId": "string",
"DatabaseName": "string",
"Name": "string"
}
}
--partition-indexes
(list)
A list of partition indexes,
PartitionIndex
structures, to create in the table.(structure)
A structure for a partition index.
Keys -> (list)
The keys for the partition index.
(string)
IndexName -> (string)
The name of the partition index.
Shorthand Syntax:
Keys=string,string,IndexName=string ...
JSON Syntax:
[
{
"Keys": ["string", ...],
"IndexName": "string"
}
...
]
--cli-input-json
| --cli-input-yaml
(string)
Reads arguments from the JSON string provided. The JSON string follows the format provided by --generate-cli-skeleton
. If other arguments are provided on the command line, those values will override the JSON-provided values. It is not possible to pass arbitrary binary values using a JSON-provided value as the string will be taken literally. This may not be specified along with --cli-input-yaml
.
--generate-cli-skeleton
(string)
Prints a JSON skeleton to standard output without sending an API request. If provided with no value or the value input
, prints a sample input JSON that can be used as an argument for --cli-input-json
. Similarly, if provided yaml-input
it will print a sample input YAML that can be used with --cli-input-yaml
. If provided with the value output
, it validates the command inputs and returns a sample output JSON for that command.
See ‘aws help’ for descriptions of global parameters.
Example 1: To create a table for a Kinesis data stream
The following create-table
example creates a table in the AWS Glue Data Catalog that describes a Kinesis data stream.
aws glue create-table \
--database-name tempdb \
--table-input '{"Name":"test-kinesis-input", "StorageDescriptor":{ \
"Columns":[ \
{"Name":"sensorid", "Type":"int"}, \
{"Name":"currenttemperature", "Type":"int"}, \
{"Name":"status", "Type":"string"}
], \
"Location":"my-testing-stream", \
"Parameters":{ \
"typeOfData":"kinesis","streamName":"my-testing-stream", \
"kinesisUrl":"https://kinesis.us-east-1.amazonaws.com" \
}, \
"SerdeInfo":{ \
"SerializationLibrary":"org.openx.data.jsonserde.JsonSerDe"} \
}, \
"Parameters":{ \
"classification":"json"} \
}' \
--profile my-profile \
--endpoint https://glue.us-east-1.amazonaws.com
This command produces no output.
For more information, see Defining Tables in the AWS Glue Data Catalog in the AWS Glue Developer Guide.
Example 2: To create a table for a Kafka data store
The following create-table
example creates a table in the AWS Glue Data Catalog that describes a Kafka data store.
aws glue create-table \
--database-name tempdb \
--table-input '{"Name":"test-kafka-input", "StorageDescriptor":{ \
"Columns":[ \
{"Name":"sensorid", "Type":"int"}, \
{"Name":"currenttemperature", "Type":"int"}, \
{"Name":"status", "Type":"string"}
], \
"Location":"glue-topic", \
"Parameters":{ \
"typeOfData":"kafka","topicName":"glue-topic", \
"connectionName":"my-kafka-connection"
}, \
"SerdeInfo":{ \
"SerializationLibrary":"org.apache.hadoop.hive.serde2.OpenCSVSerde"} \
}, \
"Parameters":{ \
"separatorChar":","} \
}' \
--profile my-profile \
--endpoint https://glue.us-east-1.amazonaws.com
This command produces no output.
For more information, see Defining Tables in the AWS Glue Data Catalog in the AWS Glue Developer Guide.
Example 3: To create a table for a AWS S3 data store
The following create-table
example creates a table in the AWS Glue Data Catalog that
describes a AWS Simple Storage Service (AWS S3) data store.
aws glue create-table \
--database-name tempdb \
--table-input '{"Name":"s3-output", "StorageDescriptor":{ \
"Columns":[ \
{"Name":"s1", "Type":"string"}, \
{"Name":"s2", "Type":"int"}, \
{"Name":"s3", "Type":"string"}
], \
"Location":"s3://bucket-path/"}, \
"SerdeInfo":{ \
"SerializationLibrary":"org.openx.data.jsonserde.JsonSerDe"}, \
"Parameters":{ \
"classification":"json"} \
}' \
--profile my-profile \
--endpoint https://glue.us-east-1.amazonaws.com
This command produces no output.
For more information, see Defining Tables in the AWS Glue Data Catalog in the AWS Glue Developer Guide.
None