record_voice_overGemini TTS Extension

Documentation for Text-to-Speech Functions

Developed by: Mr.Koder (AKA _Ahmed)

micGenerateSingleSpeakerSpeech

This function generates speech from text using a single specified voice. You can provide style instructions directly within the prompt.

GenerateSingleSpeakerSpeech textToSpeak voiceName apiKey modelName promptForStyle

inputParameters:

Parameter	Type	Description	Example / Notes
textToSpeak	Text	The plain text to be converted to speech if `promptForStyle` is empty.	`"Hello world."`
voiceName	Text	The desired voice from the Gemini API's list. See 'Available Voices' section below.	`"Kore"`, `"Puck"`
apiKey	Text	Your Google Gemini API Key.	`"AIzaSy..."`
modelName	Text	The specific Gemini TTS model to use.	`"gemini-2.5-flash-preview-tts"` or `"gemini-2.5-pro-preview-tts"`
promptForStyle	Text	The full utterance including any style instructions. If provided, `textToSpeak` is ignored.	`"Say this very excitedly: This is amazing!"`

Note on promptForStyle vs textToSpeak: If promptForStyle is NOT empty, its content will be sent to the API, and textToSpeak will be ignored. The API will interpret style instructions (e.g., "Speak calmly:") and the subsequent text from the promptForStyle value.

people_altGenerateMultiSpeakerSpeech

Generates speech for a conversation involving multiple speakers, each potentially with a different voice. The API currently supports exactly TWO distinct speakers for this function.

GenerateMultiSpeakerSpeech promptWithSpeakersAndText speakerConfigurations apiKey modelName

inputParameters:

Parameter	Type	Description	Example / Notes
promptWithSpeakersAndText	Text	The full dialogue, clearly indicating speaker names. Style prompts can be embedded.	`"Alice (Sounding curious): Hello Bob. Bob (Cheerfully): Hi Alice!"`
speakerConfigurations	List of Dictionaries	A list configuring each speaker. Each dictionary must have `"speakerName"` (matching the prompt) and `"voiceName"`. [Show Details]	See details below. Max 2 unique speakers.
apiKey	Text	Your Google Gemini API Key.	`"AIzaSy..."`
modelName	Text	The specific Gemini TTS model to use.	`"gemini-2.5-flash-preview-tts"` or `"gemini-2.5-pro-preview-tts"`

tocDetails for `speakerConfigurations`:

This must be an App Inventor list where each item is a dictionary. Each dictionary defines one speaker:

Key: "speakerName" (Text) - Must exactly match a speaker name used in your promptWithSpeakersAndText (e.g., "Alice", "Bob"). Case-sensitive.
Key: "voiceName" (Text) - The Gemini API voice to use for this speaker (e.g., "Kore", "Puck").

App Inventor Block Example:

Create SpeakerConfigurations List

App Inventor blocks showing how to create the speakerConfigurations list of dictionaries.

API Limitation: Currently, the Gemini API supports a maximum of TWO distinct speaker configurations for the multi_speaker_voice_config. If you provide configurations for more than two unique speakers, you will receive an API error.

event_availableHandling Events

After calling either speech generation function, one of the following events will be triggered.

volume_upGotSpeechAudio

When Gemini.GotSpeechAudio audioBase64 mimeType savedFilePath rawApiResponse do ...

Triggered upon successful speech generation and file saving.

Parameter	Type	Description
audioBase64	Text	The generated audio data encoded as a Base64 string. Can be very large.
mimeType	Text	The actual MIME type of the audio data as reported by the Gemini API (e.g., `"audio/wav"`, `"audio/mpeg"`).
savedFilePath	Text	The absolute path to the audio file saved on the device (e.g., in ASD or cache). Use this path with the App Inventor `Player` component.
rawApiResponse	Text	The full, raw JSON response from the Gemini API. Useful for debugging.

Playing the Audio

App Inventor blocks showing how to play audio using the Player component after GotSpeechAudio event.

error_outlineSpeechGenerationError

When Gemini.SpeechGenerationError errorMessage do ...

Triggered if an error occurs during the speech generation process (API error, network issue, file saving error, etc.).

Parameter	Type	Description
errorMessage	Text	A message describing the error. This may include details from the Gemini API.

Handling an Error


When Gemini.SpeechGenerationError
  errorMessage = [get errorMessage]
Do
  Notifier1.ShowAlert (notice = "TTS Error: " + [get errorMessage])

device_hubBasic Usage Flow

edit_note 1. Prepare Inputs (Text, Voice, API Key)

➔

extension 2. Call Gemini Function

➔

cloud_upload 3. API Processes

➔

download_done 4. Event Triggered (GotSpeechAudio / Error)

➔

play_circle_filled 5. Use Player to Play savedFilePath

infoImportant Notes & Best Practices

API Key: Keep your Gemini API Key secure. Do not embed it directly in publicly shared AIA files if possible.
Model Names: Use "gemini-2.5-flash-preview-tts" (faster, good for general use) or "gemini-2.5-pro-preview-tts" (potentially higher quality). These are "Preview" models, so their availability or features might change.
Output File: The extension attempts to create a playable .wav file. If the API returns MP3 data (mimeType will be "audio/mpeg"), the file will be saved as .mp3.
Permissions: Your App Inventor app will need INTERNET permission. No special file permissions are usually required for the extension to save to its private app-specific directory (ASD) or cache.
Error Messages: Pay close attention to the errorMessage from the SpeechGenerationError event. It often contains specific details from the Gemini API that can help you debug issues with your prompts or API key.
Rate Limits: Be mindful of Gemini API rate limits. Making too many requests in a short period might lead to temporary blocks.

support_agentAvailable Voices (`voiceName` parameter)

The Gemini TTS API supports a variety of voices. Use one of the following names for the voiceName parameter:

Zephyr
Puck
Charon
Kore
Fenrir
Leda
Orus
Aoede
Callirhoe
Autonoe
Enceladus
Iapetus
Umbriel
Algieba
Despina
Erinome
Algenib
Rasalgethi
Laomedeia
Achernar
Alnilam
Schedar
Gacrux
Pulcherrima
Achird
Zubenelgenubi
Vindemiatrix
Sadachbia
Sadaltager
Sulafar

You can test these voices in Google AI Studio.

micGenerateSingleSpeakerSpeech

inputParameters:

people_altGenerateMultiSpeakerSpeech

inputParameters:

tocDetails for speakerConfigurations:

Create SpeakerConfigurations List

event_availableHandling Events

volume_upGotSpeechAudio

Playing the Audio

error_outlineSpeechGenerationError

Handling an Error

device_hubBasic Usage Flow

infoImportant Notes & Best Practices

support_agentAvailable Voices (voiceName parameter)

tocDetails for `speakerConfigurations`:

support_agentAvailable Voices (`voiceName` parameter)