Updates a crawler. If a crawler is running, you must stop it using StopCrawler
before updating it.
See also: AWS API Documentation
See ‘aws help’ for descriptions of global parameters.
update-crawler
--name <value>
[--role <value>]
[--database-name <value>]
[--description <value>]
[--targets <value>]
[--schedule <value>]
[--classifiers <value>]
[--table-prefix <value>]
[--schema-change-policy <value>]
[--recrawl-policy <value>]
[--lineage-configuration <value>]
[--configuration <value>]
[--crawler-security-configuration <value>]
[--cli-input-json | --cli-input-yaml]
[--generate-cli-skeleton <value>]
--name
(string)
Name of the new crawler.
--role
(string)
The IAM role or Amazon Resource Name (ARN) of an IAM role that is used by the new crawler to access customer resources.
--database-name
(string)
The Glue database where results are stored, such as:
arn:aws:daylight:us-east-1::database/sometable/*
.
--description
(string)
A description of the new crawler.
--targets
(structure)
A list of targets to crawl.
S3Targets -> (list)
Specifies Amazon Simple Storage Service (Amazon S3) targets.
(structure)
Specifies a data store in Amazon Simple Storage Service (Amazon S3).
Path -> (string)
The path to the Amazon S3 target.
Exclusions -> (list)
A list of glob patterns used to exclude from the crawl. For more information, see Catalog Tables with a Crawler .
(string)
ConnectionName -> (string)
The name of a connection which allows a job or crawler to access data in Amazon S3 within an Amazon Virtual Private Cloud environment (Amazon VPC).
SampleSize -> (integer)
Sets the number of files in each leaf folder to be crawled when crawling sample files in a dataset. If not set, all the files are crawled. A valid value is an integer between 1 and 249.
EventQueueArn -> (string)
A valid Amazon SQS ARN. For example,
arn:aws:sqs:region:account:sqs
.DlqEventQueueArn -> (string)
A valid Amazon dead-letter SQS ARN. For example,
arn:aws:sqs:region:account:deadLetterQueue
.JdbcTargets -> (list)
Specifies JDBC targets.
(structure)
Specifies a JDBC data store to crawl.
ConnectionName -> (string)
The name of the connection to use to connect to the JDBC target.
Path -> (string)
The path of the JDBC target.
Exclusions -> (list)
A list of glob patterns used to exclude from the crawl. For more information, see Catalog Tables with a Crawler .
(string)
MongoDBTargets -> (list)
Specifies Amazon DocumentDB or MongoDB targets.
(structure)
Specifies an Amazon DocumentDB or MongoDB data store to crawl.
ConnectionName -> (string)
The name of the connection to use to connect to the Amazon DocumentDB or MongoDB target.
Path -> (string)
The path of the Amazon DocumentDB or MongoDB target (database/collection).
ScanAll -> (boolean)
Indicates whether to scan all the records, or to sample rows from the table. Scanning all the records can take a long time when the table is not a high throughput table.
A value of
true
means to scan all records, while a value offalse
means to sample the records. If no value is specified, the value defaults totrue
.DynamoDBTargets -> (list)
Specifies Amazon DynamoDB targets.
(structure)
Specifies an Amazon DynamoDB table to crawl.
Path -> (string)
The name of the DynamoDB table to crawl.
scanAll -> (boolean)
Indicates whether to scan all the records, or to sample rows from the table. Scanning all the records can take a long time when the table is not a high throughput table.
A value of
true
means to scan all records, while a value offalse
means to sample the records. If no value is specified, the value defaults totrue
.scanRate -> (double)
The percentage of the configured read capacity units to use by the Glue crawler. Read capacity units is a term defined by DynamoDB, and is a numeric value that acts as rate limiter for the number of reads that can be performed on that table per second.
The valid values are null or a value between 0.1 to 1.5. A null value is used when user does not provide a value, and defaults to 0.5 of the configured Read Capacity Unit (for provisioned tables), or 0.25 of the max configured Read Capacity Unit (for tables using on-demand mode).
CatalogTargets -> (list)
Specifies Glue Data Catalog targets.
(structure)
Specifies an Glue Data Catalog target.
DatabaseName -> (string)
The name of the database to be synchronized.
Tables -> (list)
A list of the tables to be synchronized.
(string)
JSON Syntax:
{
"S3Targets": [
{
"Path": "string",
"Exclusions": ["string", ...],
"ConnectionName": "string",
"SampleSize": integer,
"EventQueueArn": "string",
"DlqEventQueueArn": "string"
}
...
],
"JdbcTargets": [
{
"ConnectionName": "string",
"Path": "string",
"Exclusions": ["string", ...]
}
...
],
"MongoDBTargets": [
{
"ConnectionName": "string",
"Path": "string",
"ScanAll": true|false
}
...
],
"DynamoDBTargets": [
{
"Path": "string",
"scanAll": true|false,
"scanRate": double
}
...
],
"CatalogTargets": [
{
"DatabaseName": "string",
"Tables": ["string", ...]
}
...
]
}
--schedule
(string)
A
cron
expression used to specify the schedule (see Time-Based Schedules for Jobs and Crawlers . For example, to run something every day at 12:15 UTC, you would specify:cron(15 12 * * ? *)
.
--classifiers
(list)
A list of custom classifiers that the user has registered. By default, all built-in classifiers are included in a crawl, but these custom classifiers always override the default classifiers for a given classification.
(string)
Syntax:
"string" "string" ...
--table-prefix
(string)
The table prefix used for catalog tables that are created.
--schema-change-policy
(structure)
The policy for the crawler’s update and deletion behavior.
UpdateBehavior -> (string)
The update behavior when the crawler finds a changed schema.
DeleteBehavior -> (string)
The deletion behavior when the crawler finds a deleted object.
Shorthand Syntax:
UpdateBehavior=string,DeleteBehavior=string
JSON Syntax:
{
"UpdateBehavior": "LOG"|"UPDATE_IN_DATABASE",
"DeleteBehavior": "LOG"|"DELETE_FROM_DATABASE"|"DEPRECATE_IN_DATABASE"
}
--recrawl-policy
(structure)
A policy that specifies whether to crawl the entire dataset again, or to crawl only folders that were added since the last crawler run.
RecrawlBehavior -> (string)
Specifies whether to crawl the entire dataset again or to crawl only folders that were added since the last crawler run.
A value of
CRAWL_EVERYTHING
specifies crawling the entire dataset again.A value of
CRAWL_NEW_FOLDERS_ONLY
specifies crawling only folders that were added since the last crawler run.A value of
CRAWL_EVENT_MODE
specifies crawling only the changes identified by Amazon S3 events.
Shorthand Syntax:
RecrawlBehavior=string
JSON Syntax:
{
"RecrawlBehavior": "CRAWL_EVERYTHING"|"CRAWL_NEW_FOLDERS_ONLY"|"CRAWL_EVENT_MODE"
}
--lineage-configuration
(structure)
Specifies data lineage configuration settings for the crawler.
CrawlerLineageSettings -> (string)
Specifies whether data lineage is enabled for the crawler. Valid values are:
ENABLE: enables data lineage for the crawler
DISABLE: disables data lineage for the crawler
Shorthand Syntax:
CrawlerLineageSettings=string
JSON Syntax:
{
"CrawlerLineageSettings": "ENABLE"|"DISABLE"
}
--configuration
(string)
Crawler configuration information. This versioned JSON string allows users to specify aspects of a crawler’s behavior. For more information, see Configuring a Crawler .
--crawler-security-configuration
(string)
The name of the
SecurityConfiguration
structure to be used by this crawler.
--cli-input-json
| --cli-input-yaml
(string)
Reads arguments from the JSON string provided. The JSON string follows the format provided by --generate-cli-skeleton
. If other arguments are provided on the command line, those values will override the JSON-provided values. It is not possible to pass arbitrary binary values using a JSON-provided value as the string will be taken literally. This may not be specified along with --cli-input-yaml
.
--generate-cli-skeleton
(string)
Prints a JSON skeleton to standard output without sending an API request. If provided with no value or the value input
, prints a sample input JSON that can be used as an argument for --cli-input-json
. Similarly, if provided yaml-input
it will print a sample input YAML that can be used with --cli-input-yaml
. If provided with the value output
, it validates the command inputs and returns a sample output JSON for that command.
See ‘aws help’ for descriptions of global parameters.
None