I didn’t record narration for the previous post. It featured a dialog and I needed a scene partner. So I tried recording one with AWS’s text-to-speech engine last night, and ah… yeah, it didn’t sound as good as I was hoping. I mean, the tech is getting better, but there’s still a way to go: that uncanny valley hasn’t been bridged yet.
This is probably the best version of what I was able to make. This was using AWS’s new-ish “Generative” voice model. There are only three voices available of this kind in AWS so far. I chose the US English male voice, since it spoke at a rate which, to my ears, is about as close to a speaking rate that I’d consider natural:
I also tried the same exchange out with the “Neural” engine, which has been around for several years:
The Generative voice model is decent. Still not good enough to fool anyone that I’m speaking with a real person, yet it’s a lot better than the Neural engine. There’s no mistake with that one that I’m speaking with a computer.
So, no recorded dialogue, but it was still an interesting exercise. And it’s always a little fun playing around with AWS’s text-to-speech engine.