Text-to-speech: Cepstral’s CEO speaks

Last week, text-to-speech company Cepstral announced they’d formed a partnership with virtual world IMVU to supply text-to-speech capability.


Text-to-speech has huge potential in the virtual world context and is a feature Second Life users have been wishing for. We caught up with Cepstral’s CEO, Craig Campbell, to get a little more detail:

Lowell: What was the impetus for the Cassiopeian / Cepstral partnership?

Craig: Cepstral has a great deal of experience in the traditional Text-to-Speech marketplace, such as Interactive Voice Response (IVR) units. You know, the systems that allow you to call in to your bank and check the balance in your checking account, that sort of thing. However, one of Cepstral’s strengths has been our ability to create lots of interesting voices, not just the same old “your account balance is….” type of voice. We realized that fast-growing consumer web applications might benefit from Text-to-Speech (TTS) if it were simple to use and the voices had personality, such as the ones we’ve created.

IMVU interested us as a very popular Virtual World. Furthermore, they have an established digital goods marketplace. We wanted to enable an application within IMVU, and Cassiopeian is one of the most popular developers in IMVU. Her products are very professional and have a great reputation among IMVU users. When we approached Cassiopeian, they were very interested in adding voice capabilities to IMVU avatars, and have been a pleasure to work with.

Lowell: Is there any plans for providing the same technology in other virtual worlds like Second Life?

Craig: We’re definitely evaluating where to deploy the technology next. We’re looking at other virtual worlds, as well as other online environments. For example, we’re testing a Facebook application – VoicePoke – that allows users to send interesting messages to each other. Through a mash-up with Google’s translation service, the widget can even speak the message in different languages. We think this creative approach to using speech in web applications has a lot of room to grow; be it real-time surrogate voices for virtual worlds, messaging on social networks, or media shifting activities such as converting blogs and web text to a mobile, eyes-free audio format.

Our tagline is – “VoiceForge, We Make the Internet Talk™” – I share this to point out that we are not application developers, but rather the heart of speech on the Internet. We offer a free API that any virtual world or developer can incorporate. Our Software as a Service (SaaS) model means that developers needn’t ramp any hardware or bandwidth or linguistic knowledge in order to quickly embed a large-scale TTS feature inside their applications.

Lowell: Linden Lab have stated they’re working on the ability for people to morph their voices on the fly – will Cepstral always be focused on text-to-speech or do you see opportunities in live voice morphing?

Craig: Voice morphing technology will be an option that some users will use, as an alternative to using their real world voice in chat. But if you look at why people don’t like using their real world voice, there are four reasons: it suspends the fantasy notion of the character, for anonymity, for gender bending, and for non-native English speakers who may be able to type better than they can speak. We think that Cepstral’s Text-to-Speech solves those issues better than voice morphing. A user with voice morphing still hears their own voice, reducing the fantasy of their avatar; the user can often still be identified; a male voice with a higher pitch is not a female voice; and voice morphing does nothing for non-native speakers.

Of course, traditional Text-to-Speech doesn’t solve those problems, since most vendors offer one or two male and female voices. Cepstral’s differentiator is that we offer over 30 voices today, and we’re adding more. And those voices each have a distinct sound and personality. There’s a male Texan voice, a female British voice, and female African voice, a male Brooklyn voice, even a voice called “Damien” that sounds demonic. The user can select the voice that they want for their avatar, much like they select the clothes for their avatar – to match their avatar’s personality.

Lowell: What further developments in text-to-speech do you see occurring in coming years?

Craig: Cepstral is focusing on adding new voices, to give users more options. We have many new voices in the pipeline and foresee hundreds if not thousands of unique voices in the future. We have an alpha version of a tool that allows users to create their own custom voices called the VoiceBank™. And we’ll continue to improve the quality of the voices so that they sound even more natural.

What are your thoughts? Text-to-speech really appeals to me – is it something you’d use over voice morphing or standard voice chat?


  1. Stan Long says

    A friend of mind has recently been diagnosed with ALS. She is teaching an online college class. I’d like to be made aware of the best text-to-speech technology available for teaching online classes. And I’d also like to know more about text-to-speech avatars that could be used for teaching.


  1. […] We discussed text-to-speech with Cepstral CEO Craig Campbell – Hello Kitty Online started to take shape – The Enterprise 2.0 […]

Your comments

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Previous Posts