For the IOS app, all I had to do was form up the text, choose a voice, set the desired pitch, rate and volume of the speech, and tell it to go. This worked very well up to a point, but the only way to get more control over the sound was to change the words into non-words which worked better. Also I couldn’t find a way to save the output to a file.
On OS X everything is different, and I have had to learn about everything I wish to assert control over. This starts with controlling the window system, which is not a concern on IOS as the app owns the whole screen whilst running. Every part of the creation and management of windows and views, and the controls they use is different and harder in OS X.
Having got a ramshackle outline working, I started with the speech synthesis libraries and functions, which are again different, much more sophisticated, but requiring a fairly comprehensive set of new things to be learned at least partly before getting it to do what I want.
What I am trying to produce is an OS X app that will speak text expressively, in a selection of voices, and allow the output to be recorded to file, so it can be added, say, to a presentation, which can then be sent as a movie file for someone else to run. There are, I suspect, many people like me who find listening to recordings of their own voice uncomfortable, which makes the performance at recording worse than it should be, and always leaves the recordings feeling unsatisfactory. To get the expression, I have had to learn about phonemes and prosody, and the way they can be affected, so that the user won’t have to but can make some selections or move some controls on screen which will give them what they want without having to learn a lot of esoteric terms and symbols just to get their task done.
I am using the built in Apple voices, (You can download quite a few of them above and beyond the standard installation), because they will almost certainly work best with the software I am writing. Some, but not all of the voices, allow you into a fine level of control of the phonemes used (The components of the sounds of each word, analogous to syllables but with an accent embedded) including providing the flow of pitch across the word or phrase. This means you can, to a certain extent, make the voice sing.
I have just got everything else working well enough to have arrived at the stage where I can work on the interface and mechanics to make this happen.