0: PLAYHT - TEXT-TO-SPEECH STREAMING ELEMENT
=============================================
ELEMENT DESCRIPTION
----------------------------------
PLAYHT - TEXT-TO-SPEECH STREAMING provides ultra-realistic voice-generated audio streaming directly, enabling real-time text-to-speech use cases.
STEP-BY-STEP SETUP
--------------------------------
0) Register on PlayHT and get your PLAY.HT USER ID and SECRET KEY on
https://play.ht/studio/api-access 1) Register on plugins.wiseable.io. Create a new Credential which associates your BUBBLE APP URL and your PLAYHT USER ID and SECRET KEY.
The registration service will generate your PUBLIC ACCESS KEY. This key serves as a secure proxy for your real API key. It allows your application to communicate with the service without exposing your real API key. Since this PUBLIC ACCESS KEY is explicitly tied to your registered BUBBLE APP URL, it can only be used from that domain, ensuring that even if the key is publicly visible, it remains safe and cannot be misused by unauthorized sources.
2) In the Plugin Settings, enter your PUBLIC ACCESS KEY generated at the previous step.
3) In order to select the voice to generate speech, create a dropdown element with the provided data type "GET VOICES LIST (PLAYHT)" and as "CHOICES SOURCES", use the dynamic source "GET DATA FROM AN EXTERNAL API" and select as API PROVIDER the API "PLAYHT - GET VOICES LIST". Filter those according to your use-case and select as "OPTION CAPTION" the name of the voice.
4) Add an element supporting input text.
5) Add the PLAYHT - TEXT-TO-SPEECH STREAMING to the page on which Text-to-Speech must be performed and configure its properties.
FIELDS :
- DISPLAY AUDIO CONTROLS : Display or hide audio controls.
6) Integrate the logic into your application using the following PLAYHT - TEXT-TO-SPEECH STREAMING states and actions:
EVENTS :
- ERROR : Event triggered when an error occurs.
- END OF STREAM : Event triggered when Stream has finished to download.
- AUDIO FILE UPLOADED : Event triggers when the Audio File has been successfully uploaded, triggered using SAVE AUDIO action.
EXPOSED STATES:
Use any element able to show/process the data of interest (such as a Group with a Text field) stored within the result of the following states of the TEXT-TO-SPEECH STREAMING :
- SUPPORTED FORMATS : List of audio formats supported by the browser.
- ERROR : Error message upon Error event trigger.
- PLAYER STATUS : Return the player status. Valid values are ready | playing | paused | stopped | ended
- CURRENT PLAYER SEEK TIME : Return the current player seek time in seconds.
- TOTAL DURATION : Return the total duration of the audio in seconds.
- AUDIO FILE URL : Return the Audio File URL upon AUDIO FILE UPLOADED event.
ELEMENT ACTIONS - TRIGGERED IN WORKFLOW:
- PAUSE AUDIO : Pause the audio stream.
- RESUME AUDIO : Resume the audio stream.
- SEEK AUDIO : Seek to a specific time in the audio stream.
Inputs Fields :
- SEEK TIME : Seek to a specific time in the audio stream.
- GENERATE SPEECH : Generate speech from input.
Inputs Fields :
- VOICE : The unique ID for a PlayHT or Cloned Voice.
- VOICE ENGINE : The voice engine used to synthesize the voice. Valid values: Play3.0-mini | PlayHT2.0-turbo
- EMOTION : An emotion to be applied to the speech. Valid values: female_happy | female_sad | female_angry | female_fearful | female_disgust | female_surprised | male_happy | male_sad | male_angry | male_fearful | male_disgust | male_surprised
- INPUT : From Play.HT: To ensure fair usage, this streaming endpoint is subject to more strict rate-limits and also limits the text size it may take as input. Input text provided to the streaming endpoint may contain at most 20 sentences. A sentence is defined as a sequence of at least 35 characters separated by a punctuation character (., ? or !). Maximum text length is 2000 characters.
- SPEECH : The speed of the generated audio. Select a value from 0.25 to 4.0. 1.0 is the default.
- AUDIO FORMAT : The format to audio in. Must be one of the value of the element's state SUPPORTED FORMATS value.
- TITLE : Title of the media.
- ARTIST : Artist of the media.
- ALBUM : Album of the media.
- COVERT ART : Covert Art image of the media.
- SAVE AUDIO FILE : Save the latest audio output.
the can_do_text_to_speech property.
- FILE NAME : File Name, without extension, of the audio file to save.
- PRIVATE : Set to yes to set this file to private. ATTACHED TO must be provided to specify the thing to attach this audio file to.
- ATTACHED TO : Unique ID of the thing to attach the Audio File to.
1 : GET PLAYHT VOICES LIST
=======================================
DATA API DESCRIPTION
--------------------------------
GET PLAYHT VOICES LIST gets the full list of stock PlayHT Voices.
STEP-BY-STEP SETUP
--------------------------------
1) In order to select the voice to generate speech, create a dropdown element with the provided data type "GET VOICES LIST (PLAYHT)" and as "CHOICES SOURCES"
2) Use the dynamic source "GET DATA FROM AN EXTERNAL API" and select as API PROVIDER the API "PLAYHT - GET PLAYHT VOICES LIST". Filter those according to your use-case and select as "OPTION CAPTION" the name of the voice.
Output Fields: List of Voices, each voice containing the ID, NAME, SAMPLE, ACCENT, AGE, GENDER, LANGUAGE, LOUDNESS, STYLE, TEMPO, TEXTURE.
2 : CREATE VOICE CLONE
=======================================
ACTION DESCRIPTION
--------------------------------
CREATE VOICE CLONE creates an instant voice clone by providing an URL for a sample audio file.
The cloned voice will be based on the characteristics of the provided audio file. The audio file selected as the source for the voice clone should have a duration ranging from 2 seconds to 1 hour. It can be in any audio format, as long as it falls within the size range of 5kb to 50MB.
STEP-BY-STEP SETUP
--------------------------------
1) Set up the "CREATE VOICE CLONE" action in the workflow.
Inputs Fields :
- URL : Protocol-relative URL (//server/path/file.ext) from Bubble Uploader or Bubble Storage of the audio file selected as the source for the voice clone. The file should have a duration ranging from 2 seconds to 1 hour. It can be in any audio format, as long as it falls within the size range of 5kb to 50 MB.
- VOICE NAME : The name for this new cloned voice.
Output Fields:
- ID : ID of the new cloned voice.
- NAME : Voice name.
3 : GET CLONED VOICES LIST
=======================================
DATA API DESCRIPTION
--------------------------------
GET CLONED VOICES LIST gets a list of all cloned voices created by the user.
STEP-BY-STEP SETUP
--------------------------------
1) In order to select the voice to generate speech, create a dropdown element with the provided data type "GET CLONED VOICES LIST (PLAYHT)" and as "CHOICES SOURCES"
2) Use the dynamic source "GET DATA FROM AN EXTERNAL API" and select as API PROVIDER the API "PLAYHT - GET CLONED VOICES LIST".
Output Fields: List of Voices, each voice containing the ID, NAME.
4 : DELETE CLONED VOICE
=======================================
ACTION DESCRIPTION
--------------------------------
DELETE CLONED VOICE deletes a cloned voice created by the user using the provided the VOICE ID
STEP-BY-STEP SETUP
--------------------------------
1) Set up the "DELETE CLONED VOICE" action in the workflow.
Inputs Fields :
- VOICE ID : The ID of the cloned voice to be deleted.
Output Fields:
- MESSAGE : Operation results from PlayHT.
IMPLEMENTATION EXAMPLE
======================
Feel free to browse the app editor in the Service URL for an implementation example.
TROUBLESHOOTING
================
Any plugin related error will be posted to the the Logs tab, "Server logs" section of your App Editor.
Make sure that "Plugin server side output" and "Plugin server side output" is selected in "Show Advanced".
> Server Logs Details:
https://manual.bubble.io/core-resources/bubbles-interface/logs-tab#server-logsPERFORMANCE CONSIDERATIONS
===========================
N/A
QUESTIONS ?
===========
Contact us at
[email protected] for any additional feature you would require or support question.