🔗 Simon Willison: Gemini 3.1 Flash TTS
Google released Gemini 3.1 Flash TTS today, a new text-to-speech model that can be directed using prompts.
Oof! This looks like a pretty good TTS system. The example Simon gave was quite convincing. I had a play myself using Simon’s online tools, giving the model this prompt:
Say dynamically: “[surprised]Wow, impressive! [neutral] Although I do wonder if this is worth the two cents I paid. Would be nice if I chose the right word to say. [mocking] Impression? [laugh] So cute!”
(Some context, I wrote “impression” instead of “impressive” in my first test).
Here’s the result (it’s a download link download to avoid the post showing up on the podcast feed): gemini-speech.wav
Pretty decent.