create-ruleset¶

Description¶

Creates a new ruleset that can be used in a profile job to validate the data quality of a dataset.

Synopsis¶

  create-ruleset
--name <value>
[--description <value>]
--target-arn <value>
--rules <value>
[--tags <value>]
[--cli-input-json | --cli-input-yaml]
[--generate-cli-skeleton <value>]

Options¶

--name (string)

The name of the ruleset to be created. Valid characters are alphanumeric (A-Z, a-z, 0-9), hyphen (-), period (.), and space.

--description (string)

The description of the ruleset.

--target-arn (string)

The Amazon Resource Name (ARN) of a resource (dataset) that the ruleset is associated with.

--rules (list)

A list of rules that are defined with the ruleset. A rule includes one or more checks to be validated on a DataBrew dataset.

(structure)

Represents a single data quality requirement that should be validated in the scope of this dataset.

Name -> (string)

The name of the rule.

Disabled -> (boolean)

A value that specifies whether the rule is disabled. Once a rule is disabled, a profile job will not validate it during a job run. Default value is false.

CheckExpression -> (string)

The expression which includes column references, condition names followed by variable references, possibly grouped and combined with other conditions. For example, (:col1 starts_with :prefix1 or :col1 starts_with :prefix2) and (:col1 ends_with :suffix1 or :col1 ends_with :suffix2) . Column and value references are substitution variables that should start with the ‘:’ symbol. Depending on the context, substitution variables’ values can be either an actual value or a column name. These values are defined in the SubstitutionMap. If a CheckExpression starts with a column reference, then ColumnSelectors in the rule should be null. If ColumnSelectors has been defined, then there should be no columnn reference in the left side of a condition, for example, is_between :val1 and :val2 .

For more information, see Available checks

SubstitutionMap -> (map)

The map of substitution variable names to their values used in a check expression. Variable names should start with a ‘:’ (colon). Variable values can either be actual values or column names. To differentiate between the two, column names should be enclosed in backticks, for example, ":col1": "`Column A`".

key -> (string)

value -> (string)

Threshold -> (structure)

The threshold used with a non-aggregate check expression. Non-aggregate check expressions will be applied to each row in a specific column, and the threshold will be used to determine whether the validation succeeds.

Value -> (double)

The value of a threshold.

Type -> (string)

The type of a threshold. Used for comparison of an actual count of rows that satisfy the rule to the threshold value.

Unit -> (string)

Unit of threshold value. Can be either a COUNT or PERCENTAGE of the full sample size used for validation.

ColumnSelectors -> (list)

List of column selectors. Selectors can be used to select columns using a name or regular expression from the dataset. Rule will be applied to selected columns.

(structure)

Selector of a column from a dataset for profile job configuration. One selector includes either a column name or a regular expression.

Regex -> (string)

A regular expression for selecting a column from a dataset.

Name -> (string)

The name of a column from a dataset.

Shorthand Syntax:

Name=string,Disabled=boolean,CheckExpression=string,SubstitutionMap={KeyName1=string,KeyName2=string},Threshold={Value=double,Type=string,Unit=string},ColumnSelectors=[{Regex=string,Name=string},{Regex=string,Name=string}] ...

JSON Syntax:

[
  {
    "Name": "string",
    "Disabled": true|false,
    "CheckExpression": "string",
    "SubstitutionMap": {"string": "string"
      ...},
    "Threshold": {
      "Value": double,
      "Type": "GREATER_THAN_OR_EQUAL"|"LESS_THAN_OR_EQUAL"|"GREATER_THAN"|"LESS_THAN",
      "Unit": "COUNT"|"PERCENTAGE"
    },
    "ColumnSelectors": [
      {
        "Regex": "string",
        "Name": "string"
      }
      ...
    ]
  }
  ...
]

--tags (map)

Metadata tags to apply to the ruleset.

key -> (string)

value -> (string)

Shorthand Syntax:

KeyName1=string,KeyName2=string

JSON Syntax:

{"string": "string"
  ...}

--cli-input-json | --cli-input-yaml (string) Reads arguments from the JSON string provided. The JSON string follows the format provided by --generate-cli-skeleton. If other arguments are provided on the command line, those values will override the JSON-provided values. It is not possible to pass arbitrary binary values using a JSON-provided value as the string will be taken literally. This may not be specified along with --cli-input-yaml.

--generate-cli-skeleton (string) Prints a JSON skeleton to standard output without sending an API request. If provided with no value or the value input, prints a sample input JSON that can be used as an argument for --cli-input-json. Similarly, if provided yaml-input it will print a sample input YAML that can be used with --cli-input-yaml. If provided with the value output, it validates the command inputs and returns a sample output JSON for that command.

See ‘aws help’ for descriptions of global parameters.

Output¶

Name -> (string)

The unique name of the created ruleset.

Table of Contents

Feedback

User Guide

create-ruleset¶

Description¶

Synopsis¶

Options¶

Output¶