MARKETPLACE
PLUGINS
PLAYHT - TEXT-TO-SPEECH STREAMING
PlayHT - Text-to-Speech Streaming logo

PlayHT - Text-to-Speech Streaming

Published January 2024
   •    Updated June 2025

Plugin details

This plugin leverages PlayHT's Text-to-Speech capabilities to provide voice-generated audio streaming directly, enabling real-time text-to-speech use cases and is able to save the audio output in Bubble database, including as private files.
PlayHT is a cutting-edge platform specializing in providing ultra-realistic voices for a range of applications. With a focus on naturalness and clarity, Play.ht's voice technology transforms written text into lifelike audio, enhancing user experiences across various domains.

PlayHT boasts a diverse selection of ultra-realistic voices, allowing users to choose from a variety of tones, accents, and styles. These voices aim to closely mimic the nuances and intonations of human speech.

PlayHT provides also AI Voice Cloning with Unparalleled Quality. Clone high-quality voices that are 99% accurate to their real human voices.

No need for expensive equipment or complicated software. Perfect for content creators, podcasters, and businesses looking to add a personal touch to their audio projects.

Also, this plugin keeps your keys hidden from prying eyes through a dedicated API Key usable only through your app's domain name.

This plugin uses an external service to provide streaming capability.

Demo Link: https://playhttexttospeechdemo.bubbleapps.io/version-test

Editor Link: https://bubble.io/page?type=page&name=index&id=playhttexttospeechdemo-editor&tab=tabs-1

💡 𝗦𝘂𝗯𝘀𝗰𝗿𝗶𝗽𝘁𝗶𝗼𝗻𝘀 𝗮𝗿𝗲 𝗽𝗿𝗼𝗿𝗮𝘁𝗲𝗱. 𝗜𝗳 𝘆𝗼𝘂 𝗶𝗻𝘀𝘁𝗮𝗹𝗹 𝗮𝗻𝗱 𝘂𝗻𝘀𝘂𝗯𝘀𝗰𝗿𝗶𝗯𝗲 𝘁𝗵𝗶𝘀 𝗽𝗹𝘂𝗴𝗶𝗻 𝗶𝗻 𝗼𝗻𝗲 𝗱𝗮𝘆 𝘁𝗼 𝘁𝗲𝘀𝘁 𝗶𝘁 𝗼𝘂𝘁, 𝘆𝗼𝘂'𝗹𝗹 𝗼𝗻𝗹𝘆 𝗯𝗲 𝗰𝗵𝗮𝗿𝗴𝗲𝗱 𝟭/𝟯𝟬𝘁𝗵 𝗼𝗳 𝘁𝗵𝗲 𝗺𝗼𝗻𝘁𝗵𝗹𝘆 𝘀𝘂𝗯𝘀𝗰𝗿𝗶𝗽𝘁𝗶𝗼𝗻 𝗳𝗲𝗲.

📖 𝗦𝘁𝗲𝗽-𝗯𝘆-𝗦𝘁𝗲𝗽 𝗶𝗻𝘀𝘁𝗿𝘂𝗰𝘁𝗶𝗼𝗻𝘀 𝗮𝗿𝗲 𝘁𝗵𝗲 "𝗜𝗻𝘀𝘁𝗿𝘂𝗰𝘁𝗶𝗼𝗻𝘀" 𝘀𝗲𝗰𝘁𝗶𝗼𝗻 𝗮𝗻𝗱 𝗗𝗲𝗺𝗼 𝗘𝗱𝗶𝘁𝗼𝗿 𝗶𝘀 𝗶𝗻 𝘁𝗵𝗲 "𝗟𝗶𝗻𝗸𝘀" 𝘀𝗲𝗰𝘁𝗶𝗼𝗻 𝗼𝗳 𝘁𝗵𝗲 𝗣𝗹𝘂𝗴𝗶𝗻 𝗣𝗮𝗴𝗲.

Contact us at [email protected] for any additional feature you would require or support question.

$9

Per month

5.0 stars   •   1 ratings
9 installs  
This plugin does not collect or track your personal data.

Platform

Web & Native mobile

Contributor details

wise:able logo
wise:able
Joined 2020   •   122 Plugins
View contributor profile

Instructions

0: PLAYHT - TEXT-TO-SPEECH STREAMING ELEMENT =============================================

ELEMENT DESCRIPTION
----------------------------------
 PLAYHT - TEXT-TO-SPEECH STREAMING provides ultra-realistic voice-generated audio streaming directly, enabling real-time text-to-speech use cases.

STEP-BY-STEP SETUP
--------------------------------
 0) Register on PlayHT and get your PLAY.HT USER ID and SECRET KEY on https://play.ht/studio/api-access

 1) Register on plugins.wiseable.io. Create a new Credential which associates your BUBBLE APP URL and your PLAYHT USER ID and SECRET KEY.
  The registration service will generate your PUBLIC ACCESS KEY. This key serves as a secure proxy for your real API key. It allows your application to communicate with the service without exposing your real API key. Since this PUBLIC ACCESS KEY is explicitly tied to your registered BUBBLE APP URL, it can only be used from that domain, ensuring that even if the key is publicly visible, it remains safe and cannot be misused by unauthorized sources.

 2) In the Plugin Settings, enter your PUBLIC ACCESS KEY generated at the previous step.

 3) In order to select the voice to generate speech, create a dropdown element with the provided data type "GET VOICES LIST (PLAYHT)" and as "CHOICES SOURCES", use the dynamic source "GET DATA FROM AN EXTERNAL API" and select as API PROVIDER the API "PLAYHT - GET VOICES LIST". Filter those according to your use-case and select as "OPTION CAPTION" the name of the voice.

 4) Add an element supporting input text.

 5) Add the PLAYHT - TEXT-TO-SPEECH STREAMING to the page on which Text-to-Speech must be performed and configure its properties.

 FIELDS :
 - DISPLAY AUDIO CONTROLS : Display or hide audio controls.

 6) Integrate the logic into your application using the following PLAYHT - TEXT-TO-SPEECH STREAMING states and actions:

 EVENTS :
 - ERROR : Event triggered when an error occurs.
 - END OF STREAM : Event triggered when Stream has finished to download.
 - AUDIO FILE UPLOADED : Event triggers when the Audio File has been successfully uploaded, triggered using SAVE AUDIO action.


 EXPOSED STATES:
 Use any element able to show/process the data of interest (such as a Group with a Text field) stored within the result of the following states of the TEXT-TO-SPEECH STREAMING :
 - SUPPORTED FORMATS : List of audio formats supported by the browser.
 - ERROR : Error message upon Error event trigger.
 - PLAYER STATUS : Return the player status. Valid values are ready | playing | paused | stopped | ended
 - CURRENT PLAYER SEEK TIME : Return the current player seek time in seconds.
 - TOTAL DURATION : Return the total duration of the audio in seconds.
 - AUDIO FILE URL : Return the Audio File URL upon AUDIO FILE UPLOADED event.


 ELEMENT ACTIONS - TRIGGERED IN WORKFLOW:
   - PAUSE AUDIO : Pause the audio stream.
   - RESUME AUDIO : Resume the audio stream.
   - SEEK AUDIO : Seek to a specific time in the audio stream.
  Inputs Fields :
       - SEEK TIME : Seek to a specific time in the audio stream.
   - GENERATE SPEECH : Generate speech from input.
  Inputs Fields :
       - VOICE : The unique ID for a PlayHT or Cloned Voice.
       - VOICE ENGINE : The voice engine used to synthesize the voice. Valid values: Play3.0-mini | PlayHT2.0-turbo
       - EMOTION : An emotion to be applied to the speech. Valid values: female_happy | female_sad | female_angry | female_fearful | female_disgust | female_surprised | male_happy | male_sad | male_angry | male_fearful | male_disgust | male_surprised
       - INPUT : From Play.HT: To ensure fair usage, this streaming endpoint is subject to more strict rate-limits and also limits the text size it may take as input. Input text provided to the streaming endpoint may contain at most 20 sentences. A sentence is defined as a sequence of at least 35 characters separated by a punctuation character (., ? or !). Maximum text length is 2000 characters.
       - SPEECH : The speed of the generated audio. Select a value from 0.25 to 4.0. 1.0 is the default.
       - AUDIO FORMAT : The format to audio in. Must be one of the value of the element's state SUPPORTED FORMATS value.
       - TITLE : Title of the media.
       - ARTIST : Artist of the media.
       - ALBUM : Album of the media.
       - COVERT ART : Covert Art image of the media.
   - SAVE AUDIO FILE : Save the latest audio output.
the can_do_text_to_speech property.
       - FILE NAME : File Name, without extension, of the audio file to save.
       - PRIVATE : Set to yes to set this file to private. ATTACHED TO must be provided to specify the thing to attach this audio file to.
       - ATTACHED TO : Unique ID of the thing to attach the Audio File to.

1 : GET PLAYHT VOICES LIST
=======================================

DATA API DESCRIPTION
--------------------------------
GET PLAYHT VOICES LIST gets the full list of stock PlayHT Voices.

STEP-BY-STEP SETUP
--------------------------------
 1) In order to select the voice to generate speech, create a dropdown element with the provided data type "GET VOICES LIST (PLAYHT)" and as "CHOICES SOURCES"
 2) Use the dynamic source "GET DATA FROM AN EXTERNAL API" and select as API PROVIDER the API "PLAYHT - GET PLAYHT VOICES LIST". Filter those according to your use-case and select as "OPTION CAPTION" the name of the voice.

   Output Fields: List of Voices, each voice containing the ID, NAME, SAMPLE, ACCENT, AGE, GENDER, LANGUAGE, LOUDNESS, STYLE, TEMPO, TEXTURE.
   

2 : CREATE VOICE CLONE
=======================================

ACTION DESCRIPTION
--------------------------------
CREATE VOICE CLONE creates an instant voice clone by providing an URL for a sample audio file.
The cloned voice will be based on the characteristics of the provided audio file. The audio file selected as the source for the voice clone should have a duration ranging from 2 seconds to 1 hour. It can be in any audio format, as long as it falls within the size range of 5kb to 50MB.

STEP-BY-STEP SETUP
--------------------------------
 1)  Set up the "CREATE VOICE CLONE" action in the workflow.

   Inputs Fields :
     - URL : Protocol-relative URL (//server/path/file.ext) from Bubble Uploader or Bubble Storage of the audio file selected as the source for the voice clone. The file should have a duration ranging from 2 seconds to 1 hour. It can be in any audio format, as long as it falls within the size range of 5kb to 50 MB.
     - VOICE NAME : The name for this new cloned voice.

   Output Fields:
     - ID : ID of the new cloned voice.
     - NAME : Voice name.

3 : GET CLONED VOICES LIST
=======================================

DATA API DESCRIPTION
--------------------------------
GET CLONED VOICES LIST gets a list of all cloned voices created by the user.

STEP-BY-STEP SETUP
--------------------------------
 1) In order to select the voice to generate speech, create a dropdown element with the provided data type "GET CLONED VOICES LIST (PLAYHT)" and as "CHOICES SOURCES"
 2) Use the dynamic source "GET DATA FROM AN EXTERNAL API" and select as API PROVIDER the API "PLAYHT - GET CLONED VOICES LIST".

   Output Fields: List of Voices, each voice containing the ID, NAME.

4 : DELETE CLONED VOICE
=======================================

ACTION DESCRIPTION
--------------------------------
DELETE CLONED VOICE deletes a cloned voice created by the user using the provided the VOICE ID

STEP-BY-STEP SETUP
--------------------------------
 1)  Set up the "DELETE CLONED VOICE" action in the workflow.

   Inputs Fields :
     - VOICE ID : The ID of the cloned voice to be deleted.

   Output Fields:
     - MESSAGE : Operation results from PlayHT.

IMPLEMENTATION EXAMPLE
======================
Feel free to browse the app editor in the Service URL for an implementation example.

TROUBLESHOOTING
================
Any plugin related error will be posted to the the Logs tab, "Server logs" section of your App Editor.
 Make sure that "Plugin server side output" and "Plugin server side output" is selected in "Show Advanced".

> Server Logs Details: https://manual.bubble.io/core-resources/bubbles-interface/logs-tab#server-logs

PERFORMANCE CONSIDERATIONS
===========================
 N/A

QUESTIONS ?
===========
Contact us at [email protected] for any additional feature you would require or support question.

Types

This plugin can be found under the following types:
Api   •   Action   •   Element   •   Event

Categories

This plugin can be found under the following categories:
AI   •   Media   •   Visual Elements   •   Input Forms

Resources

Support contact
Documentation
Tutorial

Rating and reviews

Average rating (5.0)

State of the art TTS LLM straight to bubble, and it’s very well made!
January 23rd, 2024
Don’t waste your time setting up APIs or trying the other plugins. This is very well made and brings the best AI TTS there is to Bubble – in steaming mode, with almost instant playback. It’s simple, easy to setup and allows for both cloned and stock voices.
Bubble