[ aws . glue ]

update-crawler

Description

Updates a crawler. If a crawler is running, you must stop it using StopCrawler before updating it.

See also: AWS API Documentation

See ‘aws help’ for descriptions of global parameters.

Synopsis

  update-crawler
--name <value>
[--role <value>]
[--database-name <value>]
[--description <value>]
[--targets <value>]
[--schedule <value>]
[--classifiers <value>]
[--table-prefix <value>]
[--schema-change-policy <value>]
[--recrawl-policy <value>]
[--lineage-configuration <value>]
[--lake-formation-configuration <value>]
[--configuration <value>]
[--crawler-security-configuration <value>]
[--cli-input-json | --cli-input-yaml]
[--generate-cli-skeleton <value>]

Options

--name (string)

Name of the new crawler.

--role (string)

The IAM role or Amazon Resource Name (ARN) of an IAM role that is used by the new crawler to access customer resources.

--database-name (string)

The Glue database where results are stored, such as: arn:aws:daylight:us-east-1::database/sometable/* .

--description (string)

A description of the new crawler.

--targets (structure)

A list of targets to crawl.

S3Targets -> (list)

Specifies Amazon Simple Storage Service (Amazon S3) targets.

(structure)

Specifies a data store in Amazon Simple Storage Service (Amazon S3).

Path -> (string)

The path to the Amazon S3 target.

Exclusions -> (list)

A list of glob patterns used to exclude from the crawl. For more information, see Catalog Tables with a Crawler .

(string)

ConnectionName -> (string)

The name of a connection which allows a job or crawler to access data in Amazon S3 within an Amazon Virtual Private Cloud environment (Amazon VPC).

SampleSize -> (integer)

Sets the number of files in each leaf folder to be crawled when crawling sample files in a dataset. If not set, all the files are crawled. A valid value is an integer between 1 and 249.

EventQueueArn -> (string)

A valid Amazon SQS ARN. For example, arn:aws:sqs:region:account:sqs .

DlqEventQueueArn -> (string)

A valid Amazon dead-letter SQS ARN. For example, arn:aws:sqs:region:account:deadLetterQueue .

JdbcTargets -> (list)

Specifies JDBC targets.

(structure)

Specifies a JDBC data store to crawl.

ConnectionName -> (string)

The name of the connection to use to connect to the JDBC target.

Path -> (string)

The path of the JDBC target.

Exclusions -> (list)

A list of glob patterns used to exclude from the crawl. For more information, see Catalog Tables with a Crawler .

(string)

MongoDBTargets -> (list)

Specifies Amazon DocumentDB or MongoDB targets.

(structure)

Specifies an Amazon DocumentDB or MongoDB data store to crawl.

ConnectionName -> (string)

The name of the connection to use to connect to the Amazon DocumentDB or MongoDB target.

Path -> (string)

The path of the Amazon DocumentDB or MongoDB target (database/collection).

ScanAll -> (boolean)

Indicates whether to scan all the records, or to sample rows from the table. Scanning all the records can take a long time when the table is not a high throughput table.

A value of true means to scan all records, while a value of false means to sample the records. If no value is specified, the value defaults to true .

DynamoDBTargets -> (list)

Specifies Amazon DynamoDB targets.

(structure)

Specifies an Amazon DynamoDB table to crawl.

Path -> (string)

The name of the DynamoDB table to crawl.

scanAll -> (boolean)

Indicates whether to scan all the records, or to sample rows from the table. Scanning all the records can take a long time when the table is not a high throughput table.

A value of true means to scan all records, while a value of false means to sample the records. If no value is specified, the value defaults to true .

scanRate -> (double)

The percentage of the configured read capacity units to use by the Glue crawler. Read capacity units is a term defined by DynamoDB, and is a numeric value that acts as rate limiter for the number of reads that can be performed on that table per second.

The valid values are null or a value between 0.1 to 1.5. A null value is used when user does not provide a value, and defaults to 0.5 of the configured Read Capacity Unit (for provisioned tables), or 0.25 of the max configured Read Capacity Unit (for tables using on-demand mode).

CatalogTargets -> (list)

Specifies Glue Data Catalog targets.

(structure)

Specifies an Glue Data Catalog target.

DatabaseName -> (string)

The name of the database to be synchronized.

Tables -> (list)

A list of the tables to be synchronized.

(string)

ConnectionName -> (string)

The name of the connection for an Amazon S3-backed Data Catalog table to be a target of the crawl when using a Catalog connection type paired with a NETWORK Connection type.

DeltaTargets -> (list)

Specifies Delta data store targets.

(structure)

Specifies a Delta data store to crawl one or more Delta tables.

DeltaTables -> (list)

A list of the Amazon S3 paths to the Delta tables.

(string)

ConnectionName -> (string)

The name of the connection to use to connect to the Delta table target.

WriteManifest -> (boolean)

Specifies whether to write the manifest files to the Delta table path.

JSON Syntax:

{
  "S3Targets": [
    {
      "Path": "string",
      "Exclusions": ["string", ...],
      "ConnectionName": "string",
      "SampleSize": integer,
      "EventQueueArn": "string",
      "DlqEventQueueArn": "string"
    }
    ...
  ],
  "JdbcTargets": [
    {
      "ConnectionName": "string",
      "Path": "string",
      "Exclusions": ["string", ...]
    }
    ...
  ],
  "MongoDBTargets": [
    {
      "ConnectionName": "string",
      "Path": "string",
      "ScanAll": true|false
    }
    ...
  ],
  "DynamoDBTargets": [
    {
      "Path": "string",
      "scanAll": true|false,
      "scanRate": double
    }
    ...
  ],
  "CatalogTargets": [
    {
      "DatabaseName": "string",
      "Tables": ["string", ...],
      "ConnectionName": "string"
    }
    ...
  ],
  "DeltaTargets": [
    {
      "DeltaTables": ["string", ...],
      "ConnectionName": "string",
      "WriteManifest": true|false
    }
    ...
  ]
}

--schedule (string)

A cron expression used to specify the schedule (see Time-Based Schedules for Jobs and Crawlers . For example, to run something every day at 12:15 UTC, you would specify: cron(15 12 * * ? *) .

--classifiers (list)

A list of custom classifiers that the user has registered. By default, all built-in classifiers are included in a crawl, but these custom classifiers always override the default classifiers for a given classification.

(string)

Syntax:

"string" "string" ...

--table-prefix (string)

The table prefix used for catalog tables that are created.

--schema-change-policy (structure)

The policy for the crawler’s update and deletion behavior.

UpdateBehavior -> (string)

The update behavior when the crawler finds a changed schema.

DeleteBehavior -> (string)

The deletion behavior when the crawler finds a deleted object.

Shorthand Syntax:

UpdateBehavior=string,DeleteBehavior=string

JSON Syntax:

{
  "UpdateBehavior": "LOG"|"UPDATE_IN_DATABASE",
  "DeleteBehavior": "LOG"|"DELETE_FROM_DATABASE"|"DEPRECATE_IN_DATABASE"
}

--recrawl-policy (structure)

A policy that specifies whether to crawl the entire dataset again, or to crawl only folders that were added since the last crawler run.

RecrawlBehavior -> (string)

Specifies whether to crawl the entire dataset again or to crawl only folders that were added since the last crawler run.

A value of CRAWL_EVERYTHING specifies crawling the entire dataset again.

A value of CRAWL_NEW_FOLDERS_ONLY specifies crawling only folders that were added since the last crawler run.

A value of CRAWL_EVENT_MODE specifies crawling only the changes identified by Amazon S3 events.

Shorthand Syntax:

RecrawlBehavior=string

JSON Syntax:

{
  "RecrawlBehavior": "CRAWL_EVERYTHING"|"CRAWL_NEW_FOLDERS_ONLY"|"CRAWL_EVENT_MODE"
}

--lineage-configuration (structure)

Specifies data lineage configuration settings for the crawler.

CrawlerLineageSettings -> (string)

Specifies whether data lineage is enabled for the crawler. Valid values are:

  • ENABLE: enables data lineage for the crawler

  • DISABLE: disables data lineage for the crawler

Shorthand Syntax:

CrawlerLineageSettings=string

JSON Syntax:

{
  "CrawlerLineageSettings": "ENABLE"|"DISABLE"
}

--lake-formation-configuration (structure)

Specifies Lake Formation configuration settings for the crawler.

UseLakeFormationCredentials -> (boolean)

Specifies whether to use Lake Formation credentials for the crawler instead of the IAM role credentials.

AccountId -> (string)

Required for cross account crawls. For same account crawls as the target data, this can be left as null.

Shorthand Syntax:

UseLakeFormationCredentials=boolean,AccountId=string

JSON Syntax:

{
  "UseLakeFormationCredentials": true|false,
  "AccountId": "string"
}

--configuration (string)

Crawler configuration information. This versioned JSON string allows users to specify aspects of a crawler’s behavior. For more information, see Configuring a Crawler .

--crawler-security-configuration (string)

The name of the SecurityConfiguration structure to be used by this crawler.

--cli-input-json | --cli-input-yaml (string) Reads arguments from the JSON string provided. The JSON string follows the format provided by --generate-cli-skeleton. If other arguments are provided on the command line, those values will override the JSON-provided values. It is not possible to pass arbitrary binary values using a JSON-provided value as the string will be taken literally. This may not be specified along with --cli-input-yaml.

--generate-cli-skeleton (string) Prints a JSON skeleton to standard output without sending an API request. If provided with no value or the value input, prints a sample input JSON that can be used as an argument for --cli-input-json. Similarly, if provided yaml-input it will print a sample input YAML that can be used with --cli-input-yaml. If provided with the value output, it validates the command inputs and returns a sample output JSON for that command.

See ‘aws help’ for descriptions of global parameters.

Output

None