[ aws . transcribe ]
Starts an asynchronous job to transcribe speech to text.
See also: AWS API Documentation
See ‘aws help’ for descriptions of global parameters.
  start-transcription-job
--transcription-job-name <value>
[--language-code <value>]
[--media-sample-rate-hertz <value>]
[--media-format <value>]
--media <value>
[--output-bucket-name <value>]
[--output-key <value>]
[--output-encryption-kms-key-id <value>]
[--settings <value>]
[--model-settings <value>]
[--job-execution-settings <value>]
[--content-redaction <value>]
[--identify-language | --no-identify-language]
[--language-options <value>]
[--cli-input-json | --cli-input-yaml]
[--generate-cli-skeleton <value>]
--transcription-job-name (string)
The name of the job. You can’t use the strings “
.” or “..” by themselves as the job name. The name must also be unique within an AWS account. If you try to create a transcription job with the same name as a previous transcription job, you get aConflictExceptionerror.
--language-code (string)
The language code for the language used in the input media file.
Possible values:
af-ZA
ar-AE
ar-SA
cy-GB
da-DK
de-CH
de-DE
en-AB
en-AU
en-GB
en-IE
en-IN
en-US
en-WL
es-ES
es-US
fa-IR
fr-CA
fr-FR
ga-IE
gd-GB
he-IL
hi-IN
id-ID
it-IT
ja-JP
ko-KR
ms-MY
nl-NL
pt-BR
pt-PT
ru-RU
ta-IN
te-IN
tr-TR
zh-CN
--media-sample-rate-hertz (integer)
The sample rate, in Hertz, of the audio track in the input media file.
If you do not specify the media sample rate, Amazon Transcribe determines the sample rate. If you specify the sample rate, it must match the sample rate detected by Amazon Transcribe. In most cases, you should leave the
MediaSampleRateHertzfield blank and let Amazon Transcribe determine the sample rate.
--media-format (string)
The format of the input media file.
Possible values:
mp3
mp4
wav
flac
ogg
amr
webm
--media (structure)
An object that describes the input media for a transcription job.
MediaFileUri -> (string)
The S3 object location of the input media file. The URI must be in the same region as the API endpoint that you are calling. The general form is:
For example:
For more information about S3 object names, see Object Keys in the Amazon S3 Developer Guide .
Shorthand Syntax:
MediaFileUri=string
JSON Syntax:
{
  "MediaFileUri": "string"
}
--output-bucket-name (string)
The location where the transcription is stored.
If you set the
OutputBucketName, Amazon Transcribe puts the transcript in the specified S3 bucket. When you call the GetTranscriptionJob operation, the operation returns this location in theTranscriptFileUrifield. If you enable content redaction, the redacted transcript appears inRedactedTranscriptFileUri. If you enable content redaction and choose to output an unredacted transcript, that transcript’s location still appears in theTranscriptFileUri. The S3 bucket must have permissions that allow Amazon Transcribe to put files in the bucket. For more information, see Permissions Required for IAM User Roles .You can specify an AWS Key Management Service (KMS) key to encrypt the output of your transcription using the
OutputEncryptionKMSKeyIdparameter. If you don’t specify a KMS key, Amazon Transcribe uses the default Amazon S3 key for server-side encryption of transcripts that are placed in your S3 bucket.If you don’t set the
OutputBucketName, Amazon Transcribe generates a pre-signed URL, a shareable URL that provides secure access to your transcription, and returns it in theTranscriptFileUrifield. Use this URL to download the transcription.
--output-key (string)
You can specify a location in an Amazon S3 bucket to store the output of your transcription job.
If you don’t specify an output key, Amazon Transcribe stores the output of your transcription job in the Amazon S3 bucket you specified. By default, the object key is “your-transcription-job-name.json”.
You can use output keys to specify the Amazon S3 prefix and file name of the transcription output. For example, specifying the Amazon S3 prefix, “folder1/folder2/”, as an output key would lead to the output being stored as “folder1/folder2/your-transcription-job-name.json”. If you specify “my-other-job-name.json” as the output key, the object key is changed to “my-other-job-name.json”. You can use an output key to change both the prefix and the file name, for example “folder/my-other-job-name.json”.
If you specify an output key, you must also specify an S3 bucket in the
OutputBucketNameparameter.
--output-encryption-kms-key-id (string)
The Amazon Resource Name (ARN) of the AWS Key Management Service (KMS) key used to encrypt the output of the transcription job. The user calling the
StartTranscriptionJoboperation must have permission to use the specified KMS key.You can use either of the following to identify a KMS key in the current account:
KMS Key ID: “1234abcd-12ab-34cd-56ef-1234567890ab”
KMS Key Alias: “alias/ExampleAlias”
You can use either of the following to identify a KMS key in the current account or another account:
Amazon Resource Name (ARN) of a KMS Key: “arn:aws:kms:region:account ID:key/1234abcd-12ab-34cd-56ef-1234567890ab”
ARN of a KMS Key Alias: “arn:aws:kms:region:account ID:alias/ExampleAlias”
If you don’t specify an encryption key, the output of the transcription job is encrypted with the default Amazon S3 key (SSE-S3).
If you specify a KMS key to encrypt your output, you must also specify an output location in the
OutputBucketNameparameter.
--settings (structure)
A
Settingsobject that provides optional settings for a transcription job.VocabularyName -> (string)
The name of a vocabulary to use when processing the transcription job.
ShowSpeakerLabels -> (boolean)
Determines whether the transcription job uses speaker recognition to identify different speakers in the input audio. Speaker recognition labels individual speakers in the audio file. If you set the
ShowSpeakerLabelsfield to true, you must also set the maximum number of speaker labelsMaxSpeakerLabelsfield.You can’t set both
ShowSpeakerLabelsandChannelIdentificationin the same request. If you set both, your request returns aBadRequestException.MaxSpeakerLabels -> (integer)
The maximum number of speakers to identify in the input audio. If there are more speakers in the audio than this number, multiple speakers are identified as a single speaker. If you specify the
MaxSpeakerLabelsfield, you must set theShowSpeakerLabelsfield to true.ChannelIdentification -> (boolean)
Instructs Amazon Transcribe to process each audio channel separately and then merge the transcription output of each channel into a single transcription.
Amazon Transcribe also produces a transcription of each item detected on an audio channel, including the start time and end time of the item and alternative transcriptions of the item including the confidence that Amazon Transcribe has in the transcription.
You can’t set both
ShowSpeakerLabelsandChannelIdentificationin the same request. If you set both, your request returns aBadRequestException.ShowAlternatives -> (boolean)
Determines whether the transcription contains alternative transcriptions. If you set the
ShowAlternativesfield to true, you must also set the maximum number of alternatives to return in theMaxAlternativesfield.MaxAlternatives -> (integer)
The number of alternative transcriptions that the service should return. If you specify the
MaxAlternativesfield, you must set theShowAlternativesfield to true.VocabularyFilterName -> (string)
The name of the vocabulary filter to use when transcribing the audio. The filter that you specify must have the same language code as the transcription job.
VocabularyFilterMethod -> (string)
Shorthand Syntax:
VocabularyName=string,ShowSpeakerLabels=boolean,MaxSpeakerLabels=integer,ChannelIdentification=boolean,ShowAlternatives=boolean,MaxAlternatives=integer,VocabularyFilterName=string,VocabularyFilterMethod=string
JSON Syntax:
{
  "VocabularyName": "string",
  "ShowSpeakerLabels": true|false,
  "MaxSpeakerLabels": integer,
  "ChannelIdentification": true|false,
  "ShowAlternatives": true|false,
  "MaxAlternatives": integer,
  "VocabularyFilterName": "string",
  "VocabularyFilterMethod": "remove"|"mask"
}
--model-settings (structure)
Choose the custom language model you use for your transcription job in this parameter.
LanguageModelName -> (string)
The name of your custom language model.
Shorthand Syntax:
LanguageModelName=string
JSON Syntax:
{
  "LanguageModelName": "string"
}
--job-execution-settings (structure)
Provides information about how a transcription job is executed. Use this field to indicate that the job can be queued for deferred execution if the concurrency limit is reached and there are no slots available to immediately run the job.
AllowDeferredExecution -> (boolean)
Indicates whether a job should be queued by Amazon Transcribe when the concurrent execution limit is exceeded. When the
AllowDeferredExecutionfield is true, jobs are queued and executed when the number of executing jobs falls below the concurrent execution limit. If the field is false, Amazon Transcribe returns aLimitExceededExceptionexception.If you specify the
AllowDeferredExecutionfield, you must specify theDataAccessRoleArnfield.DataAccessRoleArn -> (string)
The Amazon Resource Name (ARN) of a role that has access to the S3 bucket that contains the input files. Amazon Transcribe assumes this role to read queued media files. If you have specified an output S3 bucket for the transcription results, this role should have access to the output bucket as well.
If you specify the
AllowDeferredExecutionfield, you must specify theDataAccessRoleArnfield.
Shorthand Syntax:
AllowDeferredExecution=boolean,DataAccessRoleArn=string
JSON Syntax:
{
  "AllowDeferredExecution": true|false,
  "DataAccessRoleArn": "string"
}
--content-redaction (structure)
An object that contains the request parameters for content redaction.
RedactionType -> (string)
Request parameter that defines the entities to be redacted. The only accepted value is
PII.RedactionOutput -> (string)
The output transcript file stored in either the default S3 bucket or in a bucket you specify.
When you choose
redactedAmazon Transcribe outputs only the redacted transcript.When you choose
redacted_and_unredactedAmazon Transcribe outputs both the redacted and unredacted transcripts.
Shorthand Syntax:
RedactionType=string,RedactionOutput=string
JSON Syntax:
{
  "RedactionType": "PII",
  "RedactionOutput": "redacted"|"redacted_and_unredacted"
}
--identify-language | --no-identify-language (boolean)
Set this field to
trueto enable automatic language identification. Automatic language identification is disabled by default. You receive aBadRequestExceptionerror if you enter a value for aLanguageCode.
--language-options (list)
An object containing a list of languages that might be present in your collection of audio files. Automatic language identification chooses a language that best matches the source audio from that list.
(string)
Syntax:
"string" "string" ...
Where valid values are:
  af-ZA
  ar-AE
  ar-SA
  cy-GB
  da-DK
  de-CH
  de-DE
  en-AB
  en-AU
  en-GB
  en-IE
  en-IN
  en-US
  en-WL
  es-ES
  es-US
  fa-IR
  fr-CA
  fr-FR
  ga-IE
  gd-GB
  he-IL
  hi-IN
  id-ID
  it-IT
  ja-JP
  ko-KR
  ms-MY
  nl-NL
  pt-BR
  pt-PT
  ru-RU
  ta-IN
  te-IN
  tr-TR
  zh-CN
--cli-input-json | --cli-input-yaml (string)
Reads arguments from the JSON string provided. The JSON string follows the format provided by --generate-cli-skeleton. If other arguments are provided on the command line, those values will override the JSON-provided values. It is not possible to pass arbitrary binary values using a JSON-provided value as the string will be taken literally. This may not be specified along with --cli-input-yaml.
--generate-cli-skeleton (string)
Prints a JSON skeleton to standard output without sending an API request. If provided with no value or the value input, prints a sample input JSON that can be used as an argument for --cli-input-json. Similarly, if provided yaml-input it will print a sample input YAML that can be used with --cli-input-yaml. If provided with the value output, it validates the command inputs and returns a sample output JSON for that command.
See ‘aws help’ for descriptions of global parameters.
Example 1: To transcribe an audio file
The following start-transcription-job example transcribes your audio file.
aws transcribe start-transcription-job \
    --cli-input-json file://myfile.json
Contents of myfile.json:
{
    "TranscriptionJobName": "cli-simple-transcription-job",
    "LanguageCode": "the-language-of-your-transcription-job",
    "Media": {
        "MediaFileUri": "s3://DOC-EXAMPLE-BUCKET/Amazon-S3-prefix/your-media-file-name.file-extension"
    }
}
For more information, see Getting Started (AWS Command Line Interface) in the Amazon Transcribe Developer Guide.
Example 2: To transcribe a multi-channel audio file
The following start-transcription-job example transcribes your multi-channel audio file.
aws transcribe start-transcription-job \
    --cli-input-json file://mysecondfile.json
Contents of mysecondfile.json:
{
    "TranscriptionJobName": "cli-channelid-job",
    "LanguageCode": "the-language-of-your-transcription-job",
    "Media": {
        "MediaFileUri": "s3://DOC-EXAMPLE-BUCKET/Amazon-S3-prefix/your-media-file-name.file-extension"
    },
    "Settings":{
        "ChannelIdentification":true
    }
}
Output:
{
    "TranscriptionJob": {
        "TranscriptionJobName": "cli-channelid-job",
        "TranscriptionJobStatus": "IN_PROGRESS",
        "LanguageCode": "the-language-of-your-transcription-job",
        "Media": {
            "MediaFileUri": "s3://DOC-EXAMPLE-BUCKET/Amazon-S3-prefix/your-media-file-name.file-extension"
        },
        "StartTime": "2020-09-17T16:07:56.817000+00:00",
        "CreationTime": "2020-09-17T16:07:56.784000+00:00",
        "Settings": {
            "ChannelIdentification": true
        }
    }
}
For more information, see Transcribing Multi-Channel Audio in the Amazon Transcribe Developer Guide.
Example 3: To transcribe an audio file and identify the different speakers
The following start-transcription-job example transcribes your audio file and identifies the speakers in the transcription output.
aws transcribe start-transcription-job \
    --cli-input-json file://mythirdfile.json
Contents of mythirdfile.json:
{
    "TranscriptionJobName": "cli-speakerid-job",
    "LanguageCode": "the-language-of-your-transcription-job",
    "Media": {
        "MediaFileUri": "s3://DOC-EXAMPLE-BUCKET/Amazon-S3-prefix/your-media-file-name.file-extension"
    },
    "Settings":{
    "ShowSpeakerLabels": true,
    "MaxSpeakerLabels": 2
    }
}
Output:
{
    "TranscriptionJob": {
        "TranscriptionJobName": "cli-speakerid-job",
        "TranscriptionJobStatus": "IN_PROGRESS",
        "LanguageCode": "the-language-of-your-transcription-job",
        "Media": {
            "MediaFileUri": "s3://DOC-EXAMPLE-BUCKET/Amazon-S3-prefix/your-media-file-name.file-extension"
        },
        "StartTime": "2020-09-17T16:22:59.696000+00:00",
        "CreationTime": "2020-09-17T16:22:59.676000+00:00",
        "Settings": {
            "ShowSpeakerLabels": true,
            "MaxSpeakerLabels": 2
        }
    }
}
For more information, see Identifying Speakers in the Amazon Transcribe Developer Guide.
Example 4: To transcribe an audio file and mask any unwanted words in the transcription output
The following start-transcription-job example transcribes your audio file and uses a vocabulary filter you’ve previously created to mask any unwanted words.
aws transcribe start-transcription-job \
    --cli-input-json file://myfourthfile.json
Contents of myfourthfile.json:
{
    "TranscriptionJobName": "cli-filter-mask-job",
    "LanguageCode": "the-language-of-your-transcription-job",
    "Media": {
          "MediaFileUri": "s3://DOC-EXAMPLE-BUCKET/Amazon-S3-prefix/your-media-file-name.file-extension"
    },
    "Settings":{
        "VocabularyFilterName": "your-vocabulary-filter",
        "VocabularyFilterMethod": "mask"
    }
}
Output:
{
    "TranscriptionJob": {
        "TranscriptionJobName": "cli-filter-mask-job",
        "TranscriptionJobStatus": "IN_PROGRESS",
        "LanguageCode": "the-language-of-your-transcription-job",
        "Media": {
            "MediaFileUri": "s3://Amazon-S3-Prefix/your-media-file.file-extension"
        },
        "StartTime": "2020-09-18T16:36:18.568000+00:00",
        "CreationTime": "2020-09-18T16:36:18.547000+00:00",
        "Settings": {
            "VocabularyFilterName": "your-vocabulary-filter",
            "VocabularyFilterMethod": "mask"
        }
    }
}
For more information, see Filtering Transcriptions in the Amazon Transcribe Developer Guide.
Example 5: To transcribe an audio file and remove any unwanted words in the transcription output
The following start-transcription-job example transcribes your audio file and uses a vocabulary filter you’ve previously created to mask any unwanted words.
aws transcribe start-transcription-job \
    --cli-input-json file://myfifthfile.json
Contents of myfifthfile.json:
{
    "TranscriptionJobName": "cli-filter-remove-job",
    "LanguageCode": "the-language-of-your-transcription-job",
    "Media": {
        "MediaFileUri": "s3://DOC-EXAMPLE-BUCKET/Amazon-S3-prefix/your-media-file-name.file-extension"
    },
    "Settings":{
        "VocabularyFilterName": "your-vocabulary-filter",
        "VocabularyFilterMethod": "remove"
    }
}
Output:
{
    "TranscriptionJob": {
        "TranscriptionJobName": "cli-filter-remove-job",
        "TranscriptionJobStatus": "IN_PROGRESS",
        "LanguageCode": "the-language-of-your-transcription-job",
        "Media": {
            "MediaFileUri": "s3://DOC-EXAMPLE-BUCKET/Amazon-S3-prefix/your-media-file-name.file-extension"
        },
        "StartTime": "2020-09-18T16:36:18.568000+00:00",
        "CreationTime": "2020-09-18T16:36:18.547000+00:00",
        "Settings": {
            "VocabularyFilterName": "your-vocabulary-filter",
            "VocabularyFilterMethod": "remove"
        }
    }
}
For more information, see Filtering Transcriptions in the Amazon Transcribe Developer Guide.
Example 6: To transcribe an audio file with increased accuracy using a custom vocabulary
The following start-transcription-job example transcribes your audio file and uses a vocabulary filter you’ve previously created to mask any unwanted words.
aws transcribe start-transcription-job \
    --cli-input-json file://mysixthfile.json
Contents of mysixthfile.json:
{
    "TranscriptionJobName": "cli-vocab-job",
    "LanguageCode": "the-language-of-your-transcription-job",
    "Media": {
        "MediaFileUri": "s3://DOC-EXAMPLE-BUCKET/Amazon-S3-prefix/your-media-file-name.file-extension"
    },
    "Settings":{
        "VocabularyName": "your-vocabulary"
    }
}
Output:
{
    "TranscriptionJob": {
        "TranscriptionJobName": "cli-vocab-job",
        "TranscriptionJobStatus": "IN_PROGRESS",
        "LanguageCode": "the-language-of-your-transcription-job",
        "Media": {
            "MediaFileUri": "s3://DOC-EXAMPLE-BUCKET/Amazon-S3-prefix/your-media-file-name.file-extension"
        },
        "StartTime": "2020-09-18T16:36:18.568000+00:00",
        "CreationTime": "2020-09-18T16:36:18.547000+00:00",
        "Settings": {
            "VocabularyName": "your-vocabulary"
        }
    }
}
For more information, see Filtering Transcriptions in the Amazon Transcribe Developer Guide.
Example 7: To identify the language of an audio file and transcribe it
The following start-transcription-job example transcribes your audio file and uses a vocabulary filter you’ve previously created to mask any unwanted words.
aws transcribe start-transcription-job \
    --cli-input-json file://myseventhfile.json
Contents of myseventhfile.json:
{
    "TranscriptionJobName": "cli-identify-language-transcription-job",
    "IdentifyLanguage": true,
    "Media": {
        "MediaFileUri": "s3://DOC-EXAMPLE-BUCKET/Amazon-S3-prefix/your-media-file-name.file-extension"
    }
}
Output:
{
    "TranscriptionJob": {
        "TranscriptionJobName": "cli-identify-language-transcription-job",
        "TranscriptionJobStatus": "IN_PROGRESS",
        "Media": {
            "MediaFileUri": "s3://DOC-EXAMPLE-BUCKET/Amazon-S3-prefix/your-media-file-name.file-extension"
        },
        "StartTime": "2020-09-18T22:27:23.970000+00:00",
        "CreationTime": "2020-09-18T22:27:23.948000+00:00",
        "IdentifyLanguage": true
    }
}
For more information, see Identifying the Language in the Amazon Transcribe Developer Guide.
Example 8: To transcribe an audio file with personally identifiable information redacted
The following start-transcription-job example transcribes your audio file and redacts any personally identifiable information in the transcription output.
aws transcribe start-transcription-job \
    --cli-input-json file://myeighthfile.json
Contents of myeigthfile.json:
{
    "TranscriptionJobName": "cli-redaction-job",
    "LanguageCode": "language-code",
    "Media": {
        "MediaFileUri": "s3://Amazon-S3-Prefix/your-media-file.file-extension"
    },
    "ContentRedaction": {
        "RedactionOutput":"redacted",
        "RedactionType":"PII"
    }
}
Output:
{
    "TranscriptionJob": {
        "TranscriptionJobName": "cli-redaction-job",
        "TranscriptionJobStatus": "IN_PROGRESS",
        "LanguageCode": "language-code",
        "Media": {
            "MediaFileUri": "s3://Amazon-S3-Prefix/your-media-file.file-extension"
        },
        "StartTime": "2020-09-25T23:49:13.195000+00:00",
        "CreationTime": "2020-09-25T23:49:13.176000+00:00",
        "ContentRedaction": {
            "RedactionType": "PII",
            "RedactionOutput": "redacted"
        }
    }
}
For more information, see Automatic Content Redaction in the Amazon Transcribe Developer Guide.
Example 9: To generate a transcript with personally identifiable information (PII) redacted and an unredacted transcript
The following start-transcription-job example generates two transcrptions of your audio file, one with the personally identifiable information redacted, and the other without any redactions.
aws transcribe start-transcription-job \
    --cli-input-json file://myninthfile.json
Contents of myninthfile.json:
{
    "TranscriptionJobName": "cli-redaction-job-with-unredacted-transcript",
    "LanguageCode": "language-code",
    "Media": {
          "MediaFileUri": "s3://Amazon-S3-Prefix/your-media-file.file-extension"
        },
    "ContentRedaction": {
        "RedactionOutput":"redacted_and_unredacted",
        "RedactionType":"PII"
    }
}
Output:
{
    "TranscriptionJob": {
        "TranscriptionJobName": "cli-redaction-job-with-unredacted-transcript",
        "TranscriptionJobStatus": "IN_PROGRESS",
        "LanguageCode": "language-code",
        "Media": {
            "MediaFileUri": "s3://Amazon-S3-Prefix/your-media-file.file-extension"
        },
        "StartTime": "2020-09-25T23:59:47.677000+00:00",
        "CreationTime": "2020-09-25T23:59:47.653000+00:00",
        "ContentRedaction": {
            "RedactionType": "PII",
            "RedactionOutput": "redacted_and_unredacted"
        }
    }
}
For more information, see Automatic Content Redaction in the Amazon Transcribe Developer Guide.
Example 10: To use a custom language model you’ve previously created to transcribe an audio file.
The following start-transcription-job example transcribes your audio file with a custom language model you’ve previously created.
aws transcribe start-transcription-job \
    --cli-input-json file://mytenthfile.json
Contents of mytenthfile.json:
{
    "TranscriptionJobName": "cli-clm-2-job-1",
    "LanguageCode": "language-code",
    "Media": {
        "MediaFileUri": "s3://DOC-EXAMPLE-BUCKET/your-audio-file.file-extension"
    },
    "ModelSettings": {
        "LanguageModelName":"cli-clm-2"
    }
}
Output:
{
    "TranscriptionJob": {
        "TranscriptionJobName": "cli-clm-2-job-1",
        "TranscriptionJobStatus": "IN_PROGRESS",
        "LanguageCode": "language-code",
        "Media": {
            "MediaFileUri": "s3://DOC-EXAMPLE-BUCKET/your-audio-file.file-extension"
        },
        "StartTime": "2020-09-28T17:56:01.835000+00:00",
        "CreationTime": "2020-09-28T17:56:01.801000+00:00",
        "ModelSettings": {
            "LanguageModelName": "cli-clm-2"
        }
    }
}
For more information, see Improving Domain-Specific Transcription Accuracy with Custom Language Models in the Amazon Transcribe Developer Guide.
TranscriptionJob -> (structure)
An object containing details of the asynchronous transcription job.
TranscriptionJobName -> (string)
The name of the transcription job.
TranscriptionJobStatus -> (string)
The status of the transcription job.
LanguageCode -> (string)
The language code for the input speech.
MediaSampleRateHertz -> (integer)
The sample rate, in Hertz, of the audio track in the input media file.
MediaFormat -> (string)
The format of the input media file.
Media -> (structure)
An object that describes the input media for the transcription job.
MediaFileUri -> (string)
The S3 object location of the input media file. The URI must be in the same region as the API endpoint that you are calling. The general form is:
For example:
For more information about S3 object names, see Object Keys in the Amazon S3 Developer Guide .
Transcript -> (structure)
An object that describes the output of the transcription job.
TranscriptFileUri -> (string)
The S3 object location of the transcript.
Use this URI to access the transcript. If you specified an S3 bucket in the
OutputBucketNamefield when you created the job, this is the URI of that bucket. If you chose to store the transcript in Amazon Transcribe, this is a shareable URL that provides secure access to that location.RedactedTranscriptFileUri -> (string)
The S3 object location of the redacted transcript.
Use this URI to access the redacted transcript. If you specified an S3 bucket in the
OutputBucketNamefield when you created the job, this is the URI of that bucket. If you chose to store the transcript in Amazon Transcribe, this is a shareable URL that provides secure access to that location.StartTime -> (timestamp)
A timestamp that shows with the job was started processing.
CreationTime -> (timestamp)
A timestamp that shows when the job was created.
CompletionTime -> (timestamp)
A timestamp that shows when the job was completed.
FailureReason -> (string)
If the
TranscriptionJobStatusfield isFAILED, this field contains information about why the job failed.The
FailureReasonfield can contain one of the following values:
Unsupported media format- The media format specified in theMediaFormatfield of the request isn’t valid. See the description of theMediaFormatfield for a list of valid values.
The media format provided does not match the detected media format- The media format of the audio file doesn’t match the format specified in theMediaFormatfield in the request. Check the media format of your media file and make sure that the two values match.
Invalid sample rate for audio file- The sample rate specified in theMediaSampleRateHertzof the request isn’t valid. The sample rate must be between 8000 and 48000 Hertz.
The sample rate provided does not match the detected sample rate- The sample rate in the audio file doesn’t match the sample rate specified in theMediaSampleRateHertzfield in the request. Check the sample rate of your media file and make sure that the two values match.
Invalid file size: file size too large- The size of your audio file is larger than Amazon Transcribe can process. For more information, see Limits in the Amazon Transcribe Developer Guide .
Invalid number of channels: number of channels too large- Your audio contains more channels than Amazon Transcribe is configured to process. To request additional channels, see Amazon Transcribe Limits in the Amazon Web Services General Reference .Settings -> (structure)
Optional settings for the transcription job. Use these settings to turn on speaker recognition, to set the maximum number of speakers that should be identified and to specify a custom vocabulary to use when processing the transcription job.
VocabularyName -> (string)
The name of a vocabulary to use when processing the transcription job.
ShowSpeakerLabels -> (boolean)
Determines whether the transcription job uses speaker recognition to identify different speakers in the input audio. Speaker recognition labels individual speakers in the audio file. If you set the
ShowSpeakerLabelsfield to true, you must also set the maximum number of speaker labelsMaxSpeakerLabelsfield.You can’t set both
ShowSpeakerLabelsandChannelIdentificationin the same request. If you set both, your request returns aBadRequestException.MaxSpeakerLabels -> (integer)
The maximum number of speakers to identify in the input audio. If there are more speakers in the audio than this number, multiple speakers are identified as a single speaker. If you specify the
MaxSpeakerLabelsfield, you must set theShowSpeakerLabelsfield to true.ChannelIdentification -> (boolean)
Instructs Amazon Transcribe to process each audio channel separately and then merge the transcription output of each channel into a single transcription.
Amazon Transcribe also produces a transcription of each item detected on an audio channel, including the start time and end time of the item and alternative transcriptions of the item including the confidence that Amazon Transcribe has in the transcription.
You can’t set both
ShowSpeakerLabelsandChannelIdentificationin the same request. If you set both, your request returns aBadRequestException.ShowAlternatives -> (boolean)
Determines whether the transcription contains alternative transcriptions. If you set the
ShowAlternativesfield to true, you must also set the maximum number of alternatives to return in theMaxAlternativesfield.MaxAlternatives -> (integer)
The number of alternative transcriptions that the service should return. If you specify the
MaxAlternativesfield, you must set theShowAlternativesfield to true.VocabularyFilterName -> (string)
The name of the vocabulary filter to use when transcribing the audio. The filter that you specify must have the same language code as the transcription job.
VocabularyFilterMethod -> (string)
ModelSettings -> (structure)
An object containing the details of your custom language model.
LanguageModelName -> (string)
The name of your custom language model.
JobExecutionSettings -> (structure)
Provides information about how a transcription job is executed.
AllowDeferredExecution -> (boolean)
Indicates whether a job should be queued by Amazon Transcribe when the concurrent execution limit is exceeded. When the
AllowDeferredExecutionfield is true, jobs are queued and executed when the number of executing jobs falls below the concurrent execution limit. If the field is false, Amazon Transcribe returns aLimitExceededExceptionexception.If you specify the
AllowDeferredExecutionfield, you must specify theDataAccessRoleArnfield.DataAccessRoleArn -> (string)
The Amazon Resource Name (ARN) of a role that has access to the S3 bucket that contains the input files. Amazon Transcribe assumes this role to read queued media files. If you have specified an output S3 bucket for the transcription results, this role should have access to the output bucket as well.
If you specify the
AllowDeferredExecutionfield, you must specify theDataAccessRoleArnfield.ContentRedaction -> (structure)
An object that describes content redaction settings for the transcription job.
RedactionType -> (string)
Request parameter that defines the entities to be redacted. The only accepted value is
PII.RedactionOutput -> (string)
The output transcript file stored in either the default S3 bucket or in a bucket you specify.
When you choose
redactedAmazon Transcribe outputs only the redacted transcript.When you choose
redacted_and_unredactedAmazon Transcribe outputs both the redacted and unredacted transcripts.IdentifyLanguage -> (boolean)
A value that shows if automatic language identification was enabled for a transcription job.
LanguageOptions -> (list)
An object that shows the optional array of languages inputted for transcription jobs with automatic language identification enabled.
(string)
IdentifiedLanguageScore -> (float)
A value between zero and one that Amazon Transcribe assigned to the language that it identified in the source audio. Larger values indicate that Amazon Transcribe has higher confidence in the language it identified.