Tuesday, 8 July 2014

Adventures in voice synthesis, part 1

Drake M. @DMODP:
@fanzyflani Ugh. I hope you're writing this effort down and plan to make a tutorial on it. Super-curious.
I think that would be a good idea.

I'm working on a new game with a working title of "HeroGraze". The name isn't exactly the best, but at the time I came up with it, putting it in quotes yielded no results on Google, so there we go!

The idea: You're a girl who wants to be a superhero, and a song plays in the background with words saying what you're doing, what you need to do (worded as what you're going to do), and how awesome and brave and heroic you are.

Basically, I'm making a real "feel good" game.

The catch is I have to write a voice synthesiser. At the moment, what I have is 6 vowels being sung at about C4, fed through AutoTalent, and clipped into small loops.

And then I have these terrible consonant samples. These are played as one-shot samples.

I then use a custom mixer for sackit (my .it playroutine) which detects when a particular sample is played, sets the sample volume to 0, and steals the pitch and note-on status and stuff so it can mix the voice synth over the top.

Steps required

There's a few steps that need to be done to make this work. This paragraph is pretty useless, but I really feel like there should be some text here before I drop in a subheading.

Music

This is created with SchismTracker, and the samples mostly created with SunVox. These fly out of my arse like a hot curry. I put annotations on the lyric line to point out which notes have to rhyme with which notes, at what point a phrase starts, and when to reset the rhyming pattern. These annotations aren't read yet, sadly.

Dictionaries

Lists of verbs and nouns and adjectives and whatnot. There's a bit of overlap with these. There are a lot of words to add,

Facts

This step is still being worked on. Facts will be stored in a tagged tree sort of structure.

Bullshitter

This step hasn't been started on yet. The idea is that it can take facts and exaggerate things and can even conjure up complete untruths.

Phrase generator

This step is still being worked on. Basically, you give it facts and it spits out valid phrases in different forms.

Pronouncer

But of course, you need to take these sentences and work out their pronuncations. Firstly, so the speech synth can actually do something. Secondly, so you can tell what rhymes with what.

This uses a dictionary to accept English words and spit out Lojban-esque syllables. No, Lojban isn't going to be used for the reasoning - we're trying to make the song not sound Lojbanic here!

Poeticiser

This step hasn't been started on yet. It gets stuff from the phrase generator and works out pronunciations, and then matches it up with the annotations and gets it to fit and rhyme.

I'll need a synonym dictionary for maximum effect.

Synthesiser

It accepts syllables starting with optional consonants, having one or two vowels in the middle (I'll need to raise this limit), and ending with optional consonants. For example, several possible pronunciations of "heroically":
  • 'iro,ykyl,i
  • 'iro,yk,li
  • 'yro,ykal,i
 The quote is an "h" sound, and the "y" is the vowel you get when you let your mouth rest (kinda like an "uh" sound).

Issues

The code for the phrase generator is a complete and utter mess and I'm struggling to get my head around it, so I'll need to do some rewriting of it. But I'm happy with the form the facts are being stored as.

I really should be treating "n" and "m" as vowels. This would mean that I would have to be able to handle more than two vowels in a row.

And of course, the best way to fix that is to move the vox synth control to the Lua side.

For the other consonants, while stuff like k/p/t would be best done as one-shots, stuff like c/f/s/'/x (note, "c" is a "sh" sound, and "x" is just plain weird!) would be best done with the vowel engine.

And of course, let's not forget that there are voiced and unvoiced consonants.  j/c, f/v, s/z for the vowel engine, and k/g, p/b, t/d for the consonant engine.

I could possibly change the samples and make a Karplus-Strong synth.

My main foci at this stage, however, are the sentence generation step, and, well, the game itself.

At least I'm using Lua this time. Makes it much easier.

No comments:

Post a Comment