This document outlines the requirements for the Playback API, which provides a unified interface for controlling text-to-speech (TTS) playback.
- Start, pause, resume, and stop
- Handle both individual and batched text/SSML input
- Report current playback state (playing, paused, stopped)
- Accept plain text and SSML input
- Support multiple utterances
- Emit events for state changes
- Provide word/sentence boundary information
- Report errors and warnings
- Select from available voices
- Configure voice parameters (rate, pitch, volume)
[WIP]
A PlaybackEngineProvider allows you to get available voices and create instances of the PlaybackEngine using one specific voice, language, etc.
This PlaybackEngine is using a voice, its parameters can be set, is loaded with utterances, can preload with context, and allows you to speak an utterance index.
A PlaybackNavigator then handles navigation, continuous play, etc.
Note as a “hidden” requirement that it has to work as a standalone module for web consumers as well, who will not rely on a Navigator and Preferences API.
In Readium Speech we will probably use an
initwhere you pass your engines, and you cangetVoices()after that. But pitch, rate, volume, etc. we discussed this morning they should go into theloadUtterancessomewhat as some TTS engines require this to be set for each utterance otherwise they do not work well.