Vocabularium: known issues
This page lists some known bugs and issues with the current public release of
the Vocabularium vocabulary podcast software, along with
the status of any resolution where appropriate.
- Only Cepstral voices are supported under non-Windows operating systems.
The plan is to add support for other good-quality voices as and when users
show sufficient interest in them.
- English stress pattern occasionally implies incorrect morphology.
For example, it has been noted that at least with Cepstral Lawrence, there is a tendency
for a phrase such as the living room to be given a stress pattern that
implies a meaning of "the room that is living" (Adjective + Noun), rather
than "the room for living" (Noun + Noun compound). Problems of this kind are to be expected, since we're
using to read a list of isolated words and phrases a model that was probably trained on and
intended for reading "whole sentences".
Some work is underway to
compensate for this and "force" the speech engine to put stresses in more
appropriate places.
Update 31 May 2009: Version 0.04 Fixes some of these badly pronounced phrases.
- Prosody does not always convey correct information structure. When humans read
a list of vocabulary items (or indeed, in speech in general), they tend to dynamically alter
their speaking style to cue "new" vs "given" information (among other things),
and introduce other "cues" to signal structure.
For example, if the item upstairs is followed by the word downstairs,
speakers will naturally tend to stress the first syllable of downstairs,
because that syllable contains the "new" information compared to the previous word.
The software currently includes some prosodic cues to signal the
"general structure" of the podcast when used with Cepstral voices.
But at present, cues are not added at the level of individual words/items such as
this example.
Future improvements to the software will
(probably in this order):
- add prosodic cues in further cases where they are considered helpful;
- implement such prosodic cues with voices from other vendors.
Update 31 May 2009: Version 0.04 Improves on some cases of prosody use to
convey information structure.
- A few other potentially fixable issues have been identified specifically
with the Cepstral Lawrence voice. For example, with this voice, the engine tends
to incorrectly insert an ao phoneme where oa is generally more
appropriate in British English (so, e.g. cot and caught become
indistinguishable, as in some US dialects).
Some manual fixes for this problem are already included
in the current release; fixes for other pronunciation problems will be introduced
over the course of future relases. Despite these occasional problems, the
Cepstral Lawrence voice is still highly recommended; it is the English voice
that has been most used in testing of the software, which is the main reason
why problems have been identified with this specific voice.
- Rendering is slower than expected under Windows XP with certain voices. Some workarounds
are being looked at for a future version.
Update 31 May 2009: This appears to have been a problem with a specific test system;
I have not had reports of it affecting other users.
The software made available from this page is copyright (c) Javamex UK 2009. All rights reserved.
All software is provided "as is" and installed and used at the user's risk.