record_voice_overGemini TTS Extension

Documentation for Text-to-Speech Functions

Developed by: Mr.Koder (AKA _Ahmed)

micGenerateSingleSpeakerSpeech

This function generates speech from text using a single specified voice. You can provide style instructions directly within the prompt.

GenerateSingleSpeakerSpeech textToSpeak voiceName apiKey modelName promptForStyle

inputParameters:

Parameter Type Description Example / Notes
textToSpeak Text The plain text to be converted to speech if promptForStyle is empty. "Hello world."
voiceName Text The desired voice from the Gemini API's list. See 'Available Voices' section below. "Kore", "Puck"
apiKey Text Your Google Gemini API Key. "AIzaSy..."
modelName Text The specific Gemini TTS model to use. "gemini-2.5-flash-preview-tts" or "gemini-2.5-pro-preview-tts"
promptForStyle Text The full utterance including any style instructions. If provided, textToSpeak is ignored. "Say this very excitedly: This is amazing!"
Note on promptForStyle vs textToSpeak: If promptForStyle is NOT empty, its content will be sent to the API, and textToSpeak will be ignored. The API will interpret style instructions (e.g., "Speak calmly:") and the subsequent text from the promptForStyle value.

people_altGenerateMultiSpeakerSpeech

Generates speech for a conversation involving multiple speakers, each potentially with a different voice. The API currently supports exactly TWO distinct speakers for this function.

GenerateMultiSpeakerSpeech promptWithSpeakersAndText speakerConfigurations apiKey modelName

inputParameters:

Parameter Type Description Example / Notes
promptWithSpeakersAndText Text The full dialogue, clearly indicating speaker names. Style prompts can be embedded. "Alice (Sounding curious): Hello Bob. Bob (Cheerfully): Hi Alice!"
speakerConfigurations List of Dictionaries A list configuring each speaker. Each dictionary must have "speakerName" (matching the prompt) and "voiceName". [Show Details] See details below. Max 2 unique speakers.
apiKey Text Your Google Gemini API Key. "AIzaSy..."
modelName Text The specific Gemini TTS model to use. "gemini-2.5-flash-preview-tts" or "gemini-2.5-pro-preview-tts"

tocDetails for speakerConfigurations:

This must be an App Inventor list where each item is a dictionary. Each dictionary defines one speaker:

  • Key: "speakerName" (Text) - Must exactly match a speaker name used in your promptWithSpeakersAndText (e.g., "Alice", "Bob"). Case-sensitive.
  • Key: "voiceName" (Text) - The Gemini API voice to use for this speaker (e.g., "Kore", "Puck").

App Inventor Block Example:

Create SpeakerConfigurations List

App Inventor blocks showing how to create the speakerConfigurations list of dictionaries.
API Limitation: Currently, the Gemini API supports a maximum of TWO distinct speaker configurations for the multi_speaker_voice_config. If you provide configurations for more than two unique speakers, you will receive an API error.

event_availableHandling Events

After calling either speech generation function, one of the following events will be triggered.

volume_upGotSpeechAudio

When Gemini.GotSpeechAudio audioBase64 mimeType savedFilePath rawApiResponse do ...

Triggered upon successful speech generation and file saving.

ParameterTypeDescription
audioBase64 Text The generated audio data encoded as a Base64 string. Can be very large.
mimeType Text The actual MIME type of the audio data as reported by the Gemini API (e.g., "audio/wav", "audio/mpeg").
savedFilePath Text The absolute path to the audio file saved on the device (e.g., in ASD or cache). Use this path with the App Inventor Player component.
rawApiResponse Text The full, raw JSON response from the Gemini API. Useful for debugging.

Playing the Audio

App Inventor blocks showing how to play audio using the Player component after GotSpeechAudio event.

error_outlineSpeechGenerationError

When Gemini.SpeechGenerationError errorMessage do ...

Triggered if an error occurs during the speech generation process (API error, network issue, file saving error, etc.).

ParameterTypeDescription
errorMessage Text A message describing the error. This may include details from the Gemini API.

Handling an Error


When Gemini.SpeechGenerationError
  errorMessage = [get errorMessage]
Do
  Notifier1.ShowAlert (notice = "TTS Error: " + [get errorMessage])
                    

device_hubBasic Usage Flow

edit_note 1. Prepare Inputs (Text, Voice, API Key)
extension 2. Call Gemini Function
cloud_upload 3. API Processes
download_done 4. Event Triggered (GotSpeechAudio / Error)
play_circle_filled 5. Use Player to Play savedFilePath

infoImportant Notes & Best Practices

  • API Key: Keep your Gemini API Key secure. Do not embed it directly in publicly shared AIA files if possible.
  • Model Names: Use "gemini-2.5-flash-preview-tts" (faster, good for general use) or "gemini-2.5-pro-preview-tts" (potentially higher quality). These are "Preview" models, so their availability or features might change.
  • Output File: The extension attempts to create a playable .wav file. If the API returns MP3 data (mimeType will be "audio/mpeg"), the file will be saved as .mp3.
  • Permissions: Your App Inventor app will need INTERNET permission. No special file permissions are usually required for the extension to save to its private app-specific directory (ASD) or cache.
  • Error Messages: Pay close attention to the errorMessage from the SpeechGenerationError event. It often contains specific details from the Gemini API that can help you debug issues with your prompts or API key.
  • Rate Limits: Be mindful of Gemini API rate limits. Making too many requests in a short period might lead to temporary blocks.

support_agentAvailable Voices (voiceName parameter)

The Gemini TTS API supports a variety of voices. Use one of the following names for the voiceName parameter:

  • Zephyr
  • Puck
  • Charon
  • Kore
  • Fenrir
  • Leda
  • Orus
  • Aoede
  • Callirhoe
  • Autonoe
  • Enceladus
  • Iapetus
  • Umbriel
  • Algieba
  • Despina
  • Erinome
  • Algenib
  • Rasalgethi
  • Laomedeia
  • Achernar
  • Alnilam
  • Schedar
  • Gacrux
  • Pulcherrima
  • Achird
  • Zubenelgenubi
  • Vindemiatrix
  • Sadachbia
  • Sadaltager
  • Sulafar

You can test these voices in Google AI Studio.