1️⃣: GET LIST OF VOICES
====================
📋 ACTION DESCRIPTION
--------------------------------
GET LIST OF VOICES returns the available voices from the language identification tag (BCP-47 code) for filtering the list returned.
🔧 STEP-BY-STEP SETUP
--------------------------------
ℹ️ The steps from 0) to 1) can be automatically performed by logging in into your Google Cloud Console, opening the Cloud Shell (top right corner of your page) and copy pasting this command and press enter:
wget -q
https://storage.googleapis.com/bubblegcpdemo/demo-assets/wiseable-gcp-texttospeech.py && python3 wiseable-gcp-texttospeech.py
0) Set-up a project from Google Cloud Console:
https://cloud.google.com/text-to-speech/docs/libraries#setting_up_authentication - Create or select a project
- Enable the TEXT-TO-SPEECH API for that project
- Create a service account
- Download a private key as JSON.
1) Open the private key JSON file with a text editor, copy/paste the following parameters from your file to the Plugin settings:
- CLIENT_EMAIL
- PROJECT_ID
- PRIVATE_KEY, including the -----BEGIN PRIVATE KEY-----\n prefix and \n-----END PRIVATE KEY-----\n suffix.
2) Set up the action "GET LIST OF VOICES" in the workflow.
Input Fields:
- LANGUAGE CODE: The language identification tag (BCP-47 code) for filtering the list of voices returned. If you don't specify this optional parameter, all available voices are returned.
- RESULT DATA TYPE: Returned type, must always be set to "RESULT (TEXT TO SPEECH)".
Output Fields:
- RESULTS: Returns a list of voices with their properties, such as gender, and voice name containing the engine.
3) Set-up a visual element supporting a list to allow the user to select the required voice property. Please refer to the demo for this specific implementation.
2️⃣: SYNTHESIZE SPEECH (BACK-END)
====================
📋 ACTION DESCRIPTION
--------------------------------
SYNTHESIZE SPEECH (BACK-END) converts plain text or SSML into synthesized speech in MP3 file format with the chosen audio profile.
🔧 STEP-BY-STEP SETUP
--------------------------------
ℹ️ If not already done, perform steps 0 and 1 from the GET LIST OF VOICES setup to configure your Google Cloud credentials.
1) Set up the action "SYNTHESIZE SPEECH (BACK-END)" in the workflow.
Input Fields:
- TEXT: Input text or SSML to synthesize. Maximum of 5000 characters total.
- NAME: The name of the voice. If not set, the service will choose a voice based on the other parameters such as LANGUAGE CODE and gender.
- GENDER: The preferred gender of the voice. If not set, the service will choose a voice based on the other parameters such as LANGUAGE CODE and NAME. Note that this is only a preference, not requirement; if a voice of the appropriate gender is not available, the synthesizer should substitute a voice with a different gender rather than failing the request.
- LANGUAGE CODE: The language identification tag (BCP-47 code) for voice synthesising.
- AUDIO PROFILE: Select the Audio Profile for the generated audio files, please refer here for more information:
https://cloud.google.com/text-to-speech/docs/audio-profiles#available_audio_profiles - RESULT DATA TYPE: Returned type, must always be set to "RESULT (TEXT TO SPEECH)".
Output Fields:
- RESULTS: Returns the synthesised speech in MP3 file in base64 stream format with the chosen audio profile.
2) Set-up an audio player element supporting MP3 base64 stream as URI as Dynamic Link, such as "Circle Music Player", then set as input of this element the output of the previous action, which will return a base64 file data.
Please refer to the demo for this specific implementation.
3️⃣: GOOGLE CLOUD - TEXT TO SPEECH (FRONT-END)
===========================================
📋 ELEMENT DESCRIPTION
--------------------------------
GOOGLE CLOUD - TEXT TO SPEECH (FRONT-END) provides the ability to convert text to speech directly from the client-side. This element is suitable for applications when reactivity is desired, such as but not limited to, mobile applications.
🔧 STEP-BY-STEP SETUP
--------------------------------
ℹ️ If not already done, perform steps 0 and 1 from the GET LIST OF VOICES setup to configure your Google Cloud credentials.
0) Register on plugins.wiseable.io. Create a new Credential which associates your BUBBLE APP URL, GOOGLE CLOUD PROJECT ID and SERVICE ACCOUNT CREDENTIALS.
The registration service will generate your PUBLIC ACCESS KEY. This key serves as a secure proxy for your real API key.
1) Enter in the PLUGIN SETTINGS your PUBLIC ACCESS KEY (used for this element only), PROJECT_ID, and other required credentials.
2) Add the GOOGLE CLOUD - TEXT TO SPEECH (FRONT-END) element to the page on which the text-to-speech feature must be integrated. Select the RESULT DATA TYPE as "RESULT (TEXT TO SPEECH)".
3) Integrate the logic into your application using the following element's states and actions:
FIELDS:
- RESULT DATA TYPE: Returned type, must always be set to "RESULT (TEXT TO SPEECH)".
EVENTS:
- SUCCESS: Event triggered upon success
- ERROR: Event triggered upon error
EXPOSED STATES:
Use any element able to show/process the data of interest stored within the result of the following states:
- RESULTS: Populated upon SUCCESS event. Returns the synthesised speech in MP3 file in base64 stream format.
- ERROR MESSAGE: Populated upon ERROR event.
- IS PROCESSING: Set to true when processing is in progress, false otherwise.
ELEMENT ACTIONS - TRIGGERED IN WORKFLOW:
- SYNTHESIZE SPEECH (FRONT-END): Convert text or SSML to speech. Populates RESULTS state upon completion.
Input Fields:
- TEXT: Input text or SSML to synthesize. Maximum of 5000 characters total.
- VOICE NAME: The name of the voice. If not set, the service will choose a voice based on the other parameters.
- GENDER: The preferred gender of the voice. If not set, the service will choose a voice based on the other parameters.
- LANGUAGE CODE: The language identification tag (BCP-47 code) for voice synthesising.
- AUDIO PROFILE: Select the Audio Profile for the generated audio files.
🔍 IMPLEMENTATION EXAMPLE
======================
Feel free to browse the app editor in the Service URL for an implementation example.
ℹ️ ADDITIONAL INFORMATION
======================
> SSML Reference:
https://cloud.google.com/text-to-speech/docs/ssml> Supported Audio Profiles here:
https://cloud.google.com/text-to-speech/docs/audio-profiles> Supported Phonemes:
https://cloud.google.com/text-to-speech/docs/phonemes> Supported Voices & Languages:
https://cloud.google.com/text-to-speech/docs/voices> GOOGLE TEXT-TO-SPEECH service limits:
https://cloud.google.com/text-to-speech/quotas⚠️ TROUBLESHOOTING
================
Any plugin related error will be posted to the the Logs tab, "Server logs" section of your App Editor.
Make sure that "Plugin server side output" and "Plugin client side output" is selected in "Show Advanced".
For front-end actions, you can also open your browser's developer console (F12 or Ctrl+Shift+I in most browsers) to view detailed error messages and logs.
Always check the ERROR MESSAGE state of the element and implement error handling using the ERROR event to provide a better user experience.
> Server Logs Details:
https://manual.bubble.io/core-resources/bubbles-interface/logs-tab#server-logs⚡ PERFORMANCE CONSIDERATIONS
===========================
⏱️ BACK-END ACTION START DELAY
-----------------------------------------------
Each time a server-side action is called, Bubble initializes a small virtual machine to execute the action. If the same action is called shortly after, the caching mechanism kicks in, resulting in faster execution on subsequent calls.
A useful workaround is to fire a dummy execution at page load, which pre-warms the Bubble engine for the next few minutes, reducing the impact of cold starts for your users.
⏳ PROCESSING TIME LIMITS
-----------------------------------------------
For back-end actions, the maximum processing duration is capped at 30 seconds as per Bubble.io design. This time limitation does not apply to front-end actions.
FRONT-END VS BACK-END PROCESSING
----------------------------------------------------
The front-end element is designed to support and optimize multiple formats and will automatically handle SSML validation and correction. The back-end action doesn't perform this optimization, so be careful with input format when using it.
❓ QUESTIONS?
===========
Contact us at
[email protected] for any additional feature you would require or support question.