Two sets of actions are available in this plugin both in synchronous and asynchronous modes:
- SYNTHESIZE SPEECH (SYNC): Synchronous request mode, useful for small file and time-sensitive application.
- START & GET SYNTHESIZE SPEECH (ASYNC): Asynchronous request mode, useful for large file and time-insensitive application, requiring an AWS S3 bucket to store the output file.
0️⃣a : AUTOMATED CONFIGURATION FOR SYNC & ASYNC
=============================================
If you do not have AWS S3 configured yet, the configuration steps can be automatically performed by using this deployment template:
https://console.aws.amazon.com/cloudformation/home?#/stacks/create/review?stackName=BubbleS3¶m_BucketName=BucketNameOfYourChoice&templateURL=https://bubble-resources.s3.amazonaws.com/deployment-assets/CloudFormation-AWSS3Plugin.yaml You will find the required parameters values used to configure your AWS S3 plugin, for which "AWS S3 DROPZONE & SQS UTILITIES" is suggested, in the "OUTPUT" tab of the created stack.
The steps from 0) to 3) b) of START & GET SYNTHESIZE SPEECH (ASYNC) can be automatically performed by using this deployment template:
https://console.aws.amazon.com/cloudformation/home?#/stacks/create/review?stackName=BubblePolly&templateURL=https://bubble-resources.s3.amazonaws.com/deployment-assets/CloudFormation-AWSPollyAsync.yaml You will find the required parameters values used across the plugin in the "OUTPUT" tab of the created stack.
0️⃣b : AUTOMATED CONFIGURATION FOR SYNC ONLY
============================================
The steps from 0) to 1) of GET LIST OF VOICES & SYNTHESIZE SPEECH (SYNC) can be automatically performed by using this deployment template:
https://console.aws.amazon.com/cloudformation/home?#/stacks/create/review?stackName=BubblePollySyncOnly&templateURL=https://bubble-resources.s3.amazonaws.com/deployment-assets/CloudFormation-AWSPollySyncOnly.yaml You will find the required parameters values used across the plugin in the "OUTPUT" tab of the created stack.
1️⃣ : AWS POLLY - TEXT TO SPEECH (FRONT-END)
=======================================
📋ELEMENT DESCRIPTION
--------------------------------
AWS POLLY - TEXT TO SPEECH (FRONT-END) element provides SYNTHESIZE SPEECH (SYNC) actions to create audio from text. The front-end element is suitable for applications when reactivity is desired, such as but not limited to, mobile applications.
🔧 STEP-BY-STEP SETUP
--------------------------------
ℹ️ The steps from 0) to 1) can be automatically performed by using this deployment template:
https://console.aws.amazon.com/cloudformation/home?#/stacks/create/review?stackName=BubblePollySyncOnly&templateURL=https://bubble-resources.s3.amazonaws.com/deployment-assets/CloudFormation-AWSPollySyncOnly.yamlYou will find the required parameters values used across the plugin in the "OUTPUT" tab of the created stack.
0) Sign-up for AWS POLLY by following this link:
https://console.aws.amazon.com/polly/home?p=ply&cp=bn&ad=c1) Create your AWS POLLY ACCESS KEY & ACCESS KEY SECRET, then add to the credentials the AWS POLLY READ-ONLY or FULL ACCESS policy:
https://docs.aws.amazon.com/general/latest/gr/aws-sec-cred-types.html#access-keys-and-secret-access-keys2) Register on plugins.wiseable.io. Create a new Credential which associates your BUBBLE APP URL, AWS POLLY ACCESS KEY & ACCESS KEY SECRET.
The registration service will generate your PUBLIC ACCESS KEY. This key serves as a secure proxy for your real API key. It allows your application to communicate with the service without exposing your real API key. Since this PUBLIC ACCESS KEY is explicitly tied to your registered BUBBLE APP URL, it can only be used from that domain, ensuring that even if the key is publicly visible, it remains safe and cannot be misused by unauthorized sources.
3) In the Plugin Settings, enter the following:
- PUBLIC ACCESS KEY (generated from plugins.wiseable.io)
- AWS SERVICE ENDPOINT REGION (if not provided, default endpoint is "us-east-1").
4) Add the AWS POLLY - TEXT TO SPEECH (FRONT-END) element to the page. Select the RESULT DATA TYPE as "RESULT (POLLY)".
5) Integrate the logic into your application using the following element's states and actions:
FIELDS:
- RESULT DATA TYPE: Returned type, must always be set to "RESULT (POLLY)".
EVENTS:
- SUCCESS: Event triggered upon success
- ERROR: Event triggered upon error
EXPOSED STATES:
Use any element able to show/process the data of interest (such as a Group with an audio player) stored within the result of the following states:
- RESULTS: Populated upon SUCCESS event. Returns any generated audio or speech marks.
- ERROR MESSAGE: Populated upon ERROR event.
- IS PROCESSING: Set to true when processing is in progress, false otherwise.
- REQUESTED ACTION: The latest requested action.
ELEMENT ACTIONS - TRIGGERED IN WORKFLOW:
- SYNTHESIZE SPEECH (SYNC) (FRONT-END): Returns the audio datastream from text provided as input.
Input Fields:
- TEXT: Input text or SSML to synthesize.
- VOICE ID: Voice ID to use for the synthesis. This parameter is available in the output of GET LIST OF VOICES action.
- ENGINE: Specifies the engine (standard or neural) for AWS POLLY to use when processing input text for speech synthesis.
- SAMPLE RATE: The audio frequency specified in Hz. See documentation for valid values.
- OUTPUT FORMAT: The format in which the returned output will be encoded. Valid values are: mp3 | ogg_vorbis | pcm | json
2️⃣ : GET LIST OF VOICES
=======================================
📋 ACTION DESCRIPTION
--------------------------------
GET LIST OF VOICES lists the available voices in AWS POLLY.
🔧 STEP-BY-STEP SETUP
--------------------------------
ℹ️ Remember that steps 0) to 1) can be automatically performed using the deployment template mentioned in section 0b.
0) Sign-up for AWS POLLY by following this link:
https://console.aws.amazon.com/polly/home?p=ply&cp=bn&ad=c 1) Create your AWS POLLY ACCESS KEY & ACCESS KEY SECRET and attach the AWS POLLY READ ONLY or FULL ACCESS POLICY to DESCRIBEVOICES API:
https://docs.aws.amazon.com/general/latest/gr/aws-sec-cred-types.html#access-keys-and-secret-access-keys 2) Enter in the PLUGIN SETTINGS your AWS POLLY ACCESS KEY & ACCESS KEY SECRET & AWS SERVICE ENDPOINT REGION (if not provided, the default endpoint is "us-east-1").
3) Set up the action "GET LIST OF VOICES" action in the workflow.
Inputs Fields:
- ENGINE: Specifies the engine (standard or neural) used by Amazon Polly when processing input text for speech synthesis.
- LANGUAGE CODE: The language identification tag (ISO 639 code for the language name-ISO 3166 country code) for filtering the list of voices returned. If you don't specify this optional parameter, all available voices are returned.
- RESULT DATA TYPE: Returned type, must always be set to "RESULT (POLLY)".
Output Fields:
- RESULTS: Returns a list of voices with their properties, such as gender, id, and voice name.
3️⃣ : SYNTHESIZE SPEECH (SYNC) (BACK-END)
=======================================
📋 ACTION DESCRIPTION
--------------------------------
SYNTHESIZE SPEECH (SYNC) returns the result datastream containing audio or speech marks from a text provided as input in synchronous mode.
The limit for input text or SSML is a maximum of 6000 characters total, of which no more than 3000 can be billed characters (e.g. excluding punctuation, spaces, SSML tags and such). The output stream (synthesis) is limited to 10 minutes. After this is reached, any remaining speech is cut off.
🔧 STEP-BY-STEP SETUP
--------------------------------
ℹ️ Remember that steps 0) to 1) can be automatically performed using the deployment template mentioned in section 0b.
1) Attach to your previously created AWS POLLY ACCESS KEY & ACCESS KEY SECRET the AWS POLLY FULL ACCESS POLICY to SYNTHESIZE SPEECH API.
2) Use any visual element returning a text or SSML string, used as input of the plugin action.
3) Set up the action "SYNTHESIZE SPEECH (SYNC) (BACK-END)" action in the workflow.
Inputs Fields:
- RESULT DATA TYPE: Returned type, must always be set to "RESULT (POLLY)".
- TEXT: Input text or SSML to synthesize.
- VOICE ID: Voice ID to use for the synthesis. This parameter is available in the output of GET LIST OF VOICES action.
- ENGINE: Specifies the engine (standard or neural) for AWS POLLY to use when processing input text for speech synthesis. Using a voice that is not supported for the engine selected will result in an error.
- SAMPLE RATE: The audio frequency specified in Hz. The valid values for mp3 and ogg_vorbis are "8000", "16000", "22050", and "24000". The default value for standard voices is "22050". The default value for neural voices is "24000". Valid values for pcm are "8000" and "16000", default value is "16000". Using a sample rate that is not supported will result in an error.
- OUTPUT FORMAT: The format in which the returned output will be encoded. For audio stream, when pcm is used, the content returned is audio/pcm in a signed 16-bit, 1 channel (mono), little-endian format. For speech marks, this will be json. Valid values are: mp3 | ogg_vorbis | pcm | json
Output Fields:
- RESULTS: Outputs the task status, additional status metadata and the generated audio file or speechmarks along with their metadata when completed. Valid values are: scheduled | inProgress | completed | failed.
4) Use any visual element supporting base64-encoded datastream to read the audio.
4️⃣ : START & GET SYNTHESIZE SPEECH (ASYNC)
========================================
📋 ACTION DESCRIPTION
--------------------------------
START & GET SYNTHESIZE SPEECH (ASYNC) returns the audio datastream from a text provided as input in asynchronous mode.
In asynchronous mode, the limit for the input text can be up to 100,000 billed characters (200,000 total characters). SSML tags are not counted as billed characters.
To interact with AWS S3, it is highly recommended to use this plugin in conjunction of our "AWS S3 & SQS UTILITIES" plugin, that you can find here:
https://bubble.io/plugin/aws-s3--sqs-utilities-1615057147611x666191530957733900🔧 STEP-BY-STEP SETUP
--------------------------------
ℹ️ Remember that steps 0) to 3) b) can be automatically performed using the deployment template mentioned in section 0a.
1) Attach to your previously created AWS POLLY ACCESS KEY & ACCESS KEY SECRET the AWS POLLY FULL ACCESS POLICY to STARTSYNTHESIZETASK, GETSPEEECHSYNTHESIZETASK API along with AWS S3 FULL ACCESS POLICY on one or your buckets to store the output file following these instructions:
https://docs.aws.amazon.com/polly/latest/dg/asynchronous-iam.html 2) Use any visual element returning a text or SSML string, used as input of the plugin action.
3) Set up the "START SYNTHESIZE SPEECH TASK (ASYNC)" action in the workflow:
Inputs Fields:
- TEXT: Input text or SSML to synthesize.
- VOICE ID: Voice ID to use for the synthesis. This parameter is available in the output of GET LIST OF VOICES action.
- ENGINE: Specifies the engine (standard or neural) for AWS POLLY to use when processing input text for speech synthesis. Using a voice that is not supported for the engine selected will result in an error.
- SAMPLE RATE: The audio frequency specified in Hz. The valid values for mp3 and ogg_vorbis are "8000", "16000", "22050", and "24000". The default value for standard voices is "22050". The default value for neural voices is "24000". Valid values for pcm are "8000" and "16000", default value is "16000". Using a sample rate that is not supported will result in an error.
- AWS S3 BUCKET NAME: AWS S3 bucket name to which the output file will be saved to.
- AWS S3 KEY FILE PREFIX: AWS S3 key prefix for the output file.
- OUTPUT FORMAT: The format in which the returned output will be encoded. For audio stream, when pcm is used, the content returned is audio/pcm in a signed 16-bit, 1 channel (mono), little-endian format. For speech marks, this will be json. Valid values are: mp3 | ogg_vorbis | pcm | json
Output Fields:
- TASK ID: The AWS POLLY generated identifier for a speech synthesis task, used as input for "GET SPEEECH SYNTHESIZE TASK STATUS" action.
4) Set up the action "GET SPEEECH SYNTHESIZE TASK STATUS" in a recurring workflow ('Do every x seconds'), to poll the job completion status on a regular basis. Configure the required actions to run on completed task status. Optionally, use any Visual Element to act on the task status.
Inputs Fields:
- TASK ID: The AWS POLLY speech synthesis task identifier to poll, retrieved from "START SPEEECH SYNTHESIZE TASK (ASYNC)" action.
- RESULT DATA TYPE: Returned type, must always be set to "RESULT (POLLY)".
Output Fields:
- RESULTS: Outputs the task status, additional status metadata and the generated audio file metadata when completed. Valid values are: scheduled | inProgress | completed | failed.
5) Install the "AWS S3 & SQS UTILITIES" plugin, set the plugin action "GET FILE BASE64-DATAURI FROM S3" to retrieve the file datastream from the specified bucket once the task has been confirmed completed.
Inputs Fields:
- BUCKET NAME: Bucket Name from which the file will be retrieved.
- FILE NAME: Path & File Name to retrieve. The file must be less than 4.5 megabytes. The format must be [path/]filename.ext. It must be extracted from the RESULT, field OUTPUTURI of "GET SPEEECH SYNTHESIZE TASK STATUS" action.
Example 1: path1/path2/filename.ext.
Example 2: filename.ext if the file is at the root of the bucket.
Output Fields:
- URL: Returns the URL of the file in Amazon S3 virtual-hosted-style format. Format is
https://bucket-name.s3.Region.amazonaws.com/key-name. Use this URL to retrieve the file, providing your bucket's permission allow getObject permission from the Internet.
- BASE64 DATAURI: Returns the base64-encoded file data.
6) Use any visual element supporting base64-encoded datastream to read the audio.
🔍IMPLEMENTATION EXAMPLE
======================
Feel free to browse the app editor in the Service URL for an implementation example.
ℹ️ ADDITIONAL INFORMATION
======================
> AWS POLLY Supported Languages:
https://docs.aws.amazon.com/polly/latest/dg/SupportedLanguage.html > AWS POLLY Supported SSML Tags:
https://docs.aws.amazon.com/polly/latest/dg/supportedtags.html > AWS POLLY synthesis task metadata here:
https://docs.aws.amazon.com/polly/latest/dg/API_SynthesisTask.html > AWS POLLY service limits:
https://docs.aws.amazon.com/polly/latest/dg/limits.html > AWS services availability per region:
https://aws.amazon.com/about-aws/global-infrastructure/regional-product-services/ > AWS Service endpoints list:
https://docs.aws.amazon.com/general/latest/gr/rande.html⚠️TROUBLESHOOTING
================
Any plugin related error will be posted to the the Logs tab, "Server logs" section of your App Editor.
Make sure "Plugin server side logging" and "Plugin client side logging" is selected in "Show Advanced".
For front-end actions, you can also open your browser's developer console (F12 or Ctrl+Shift+I in most browsers) to view detailed error messages and logs.
Always check the ERROR MESSAGE state of the element and implement error handling using the ERROR event to provide a better user experience.
> Server Logs Details:
https://manual.bubble.io/core-resources/bubbles-interface/logs-tab#server-logs⚡PERFORMANCE CONSIDERATIONS
===========================
GENERAL
-------------
For back-end actions, the maximum processing duration is capped at 30 seconds as per Bubble.io design. This time limitation does not apply to front-end actions.
⏱️BACK-END ACTION START DELAY
-----------------------------------------------
Each time a server-side action is called, Bubble initializes a small virtual machine to execute the action. If the same action is called shortly after, the caching mechanism kicks in, resulting in faster execution on subsequent calls.
A useful workaround is to fire a dummy execution at page load, which pre-warms the Bubble engine for the next few minutes, reducing the impact of cold starts for your users.
❓QUESTIONS?
===========
Contact us at
[email protected] for any additional feature you would require or support questions.