Daisys API JSON models¶
Pydantic classes representing the JSON interface for the Daisys API.
- class daisys.v1.speak.models.AffectProsody(*, pitch: int, pace: int, valence: int, dominance: int, arousal: int)¶
Prosody features based on analysis of affect. See also parent class
ProsodyFeatures
for other fields.- valence¶
The valence; -10 for negativity, 10 for positivity, 0 for neutral.
- Type:
int
- arousal¶
The arousal; -10 for unexcited, 10 for very excited, 0 for neutral.
- Type:
int
- dominance¶
The dominance; -10 for docile, 10 for commanding, 0 for neutral.
- Type:
int
- model_config: ClassVar[ConfigDict] = {}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class daisys.v1.speak.models.ProsodyFeatures(*, pitch: int, pace: int)¶
Base prosody features supported by all models.
- pitch¶
The normalized pitch; -10 to 10, where 0 is a neutral pitch.
- Type:
int
- pace¶
The normalized pace; -10 to 10, where 0 is a neutral pace.
- Type:
int
- model_config: ClassVar[ConfigDict] = {}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- daisys.v1.speak.models.ProsodyFeaturesUnion¶
A union type representing different prosody feature variations.
alias of
SimpleProsody
|AffectProsody
|SignalProsody
- class daisys.v1.speak.models.ProsodyType(value)¶
An enum representing different prosody feature types.
Not all models accept all prosody types. See the prosody_types field of
TTSModel
.- SIMPLE¶
corresponds with SimpleProsody
- AFFECT¶
corresponds with AffectProsody
- SIGNAL¶
corresponds with SignalProsody
- static from_class(prosody: SimpleProsody | AffectProsody | SignalProsody)¶
Return an enum value based on the prosody class provided.
- Parameters:
prosody – The prosody object from which to derive the enum value.
- prosody(**kwargs)¶
Return a prosody object corresponding to this value, initialized with the given arguments.
- class daisys.v1.speak.models.SignalProsody(*, pitch: int, pace: int, tilt: int, pitch_range: int)¶
Prosody features based on signal analysis. See also parent class ProsodyFeatures for other fields.
- tilt¶
The normalized spectral tilt; -10 for flat, 10 for bright, 0 for neutral.
- Type:
int
- pitch_range¶
The normalized pitch range; -10 for flat, 10 for highly varied pitch, 0 for neutral.
- Type:
int
- model_config: ClassVar[ConfigDict] = {}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class daisys.v1.speak.models.SimpleProsody(*, pitch: int, pace: int, expression: int)¶
Simplified prosody features, supported by all models. See also parent class
ProsodyFeatures
for other fields.- expression¶
The normalized “expression”; -10 to 10, where 0 is neutral.
- Type:
int
- model_config: ClassVar[ConfigDict] = {}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class daisys.v1.speak.models.Status(value)¶
Represents the status of a take or voice generation process.
- WAITING¶
Item is waiting to be processed.
- STARTED¶
Processing has started for this item.
- PROGRESS_25¶
Item has been 25% processed.
- PROGRESS_50¶
Item has been 50% procesesd.
- PROGRESS_75¶
Item has been 75% procesesd.
- READY¶
Item is ready to be used; for takes, audio is available.
- ERROR¶
An error occurred during processing of this item.
- TIMEOUT¶
Processing did not finish for this item.
Note that
TIMEOUT
is used for very long intervals; it does not indicate a few seconds or minutes, but rather that an item has been in the queue for more than a day and has therefore been removed. It should only be considered to represent circumstances where processing errors were not detected by normal means.
- class daisys.v1.speak.models.StreamMode(value)¶
Whether a websocket messages should contain a whole part or chunks of parts.
Note: upper case in Python, lower case in JSON.
- Values:
PARTS, CHUNKS
- class daisys.v1.speak.models.StreamOptions(*, mode: StreamMode = StreamMode.PARTS)¶
Options for streaming.
- mode¶
The streaming mode to use.
- model_config: ClassVar[ConfigDict] = {}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class daisys.v1.speak.models.TTSModel(*, name: str, displayname: str, flags: list[str] = [], languages: list[str], genders: list[VoiceGender], styles: list[list[str]] = [], prosody_types: list[ProsodyType], voice_inputs: list[VoiceInputType] | None)¶
Information about a speech model.
- name¶
The unique identifier of this model.
- Type:
str
- displayname¶
A friendlier name that might contain spaces.
- Type:
str
- flags¶
A list of flags that indicate some features of this model.
- Type:
list[str]
- languages¶
A list of languages supported by this model.
- Type:
list[str]
- genders¶
A list of genders supported by this model.
- Type:
- styles¶
A list of style sets; each sublist is a list of mutually exlusive style tags.
- Type:
list[list[str]]
- prosody_types¶
A list of which prosody types are supported by this model.
- Type:
- voice_inputs¶
A list of which voice input types are supported by this model.
- Type:
list[daisys.v1.speak.models.VoiceInputType] | None
- model_config: ClassVar[ConfigDict] = {}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class daisys.v1.speak.models.TakeGenerate(*, text: str, override_language: str | None = None, style: list[str] | None = None, prosody: SimpleProsody | AffectProsody | SignalProsody | None = None, status_webhook: Webhook | None = None, done_webhook: Webhook | None = None, voice_id: str)¶
Parameters necessary to generate a “take”, an audio file containing an utterance of the given text by the given voice. See
TakeGenerateWithoutVoice
for documentation on the remaining fields.- voice_id¶
The id of the voice to be used for generating audio. The voice is attached to a specific model.
- Type:
str
- model_config: ClassVar[ConfigDict] = {}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class daisys.v1.speak.models.TakeGenerateWithoutVoice(*, text: str, override_language: str | None = None, style: list[str] | None = None, prosody: SimpleProsody | AffectProsody | SignalProsody | None = None, status_webhook: Webhook | None = None, done_webhook: Webhook | None = None)¶
Parameters necessary to generate a “take”, an audio file containing an utterance of the given text. No voice is provided here, for the purpose of embedding in
VoiceGenerate
for the voice example.- text¶
The text that the voice should say.
- Type:
str
- override_language¶
Normally a language classifier is used to detect the language of the speech; this allows for multilingual sentences. However, if the language should be enforced, it should be provided here. Currently accepted values are “nl-NL” and “en-GB”.
- Type:
str | None
- style¶
A list of styles to enable when speaking. Note that most styles are mutually exclusive, so a list of 1 value should be provided. Accepted styles can be retrieved from the associated voice’s
VoiceInfo.styles
or the model’sTTSModel.styles
field. Note that not all models support styles, thus this can be left empty if specific styles are not desired.- Type:
list[str] | None
- prosody¶
The characteristics of the desired speech not determined by the voice or style. Here you can provide a
SimpleProsody
or most models also accept the more detailedAffectProsody
.
- status_webhook¶
An optional URL to be called using
POST
whenever the take’s status changes, withTakeResponse
in the body content.- Type:
- done_webhook¶
An optional URL to be called exactly once using
POST
when the take isREADY
,ERROR
, orTIMEOUT
, withTakeResponse
in the body content.- Type:
- model_config: ClassVar[ConfigDict] = {}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class daisys.v1.speak.models.TakeInfo(*, duration: int, audio_rate: int, normalized_text: list[str])¶
Some information available when a take is
READY
, attached to theTakeResponse
.- duration¶
The length of the audio in samples. To get the length in seconds, divide by audio_rate.
- Type:
int
- audio_rate¶
The number of samples per second in the audio.
- Type:
int
- normalized_text¶
The text used for text-to-speech after normalization, ie. translated from “as written” to “as spoken”. Provided as a list of sentences.
- Type:
list[str]
- model_config: ClassVar[ConfigDict] = {}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class daisys.v1.speak.models.TakeResponse(*, text: str, override_language: str | None = None, style: list[str] | None = None, prosody: SimpleProsody | AffectProsody | SignalProsody | None = None, status_webhook: Webhook | None = None, done_webhook: Webhook | None = None, voice_id: str, take_id: str, status: Status, timestamp_ms: int, info: TakeInfo | None = None)¶
Information about a take, returned during and after take generation. Also includes fields from
TakeGenerate
.- take_id¶
The unique identifier of this take.
- Type:
str
- status¶
The status of this take, whether it is ready, in error, or in progress.
- timestamp_ms¶
The timestamp that this take generation was requested, in milliseconds since epoch.
- Type:
int
- model_config: ClassVar[ConfigDict] = {}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class daisys.v1.speak.models.Version(*, version: int, minor: int)¶
Represents the version of the API.
- version¶
The major version number of the API.
- Type:
int
- minor¶
The minor version number of the API.
- Type:
int
- model_config: ClassVar[ConfigDict] = {}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class daisys.v1.speak.models.VoiceGender(value)¶
Represents the gender of a voice.
Note: upper case in Python, lower case in JSON.
- Values:
MALE, FEMALE, NONBINARY
- class daisys.v1.speak.models.VoiceGenerate(*, name: str, model: str, gender: VoiceGender, description: str | None = None, default_style: list[str] | None = None, default_prosody: SimpleProsody | AffectProsody | SignalProsody | None = None, example_take: TakeGenerateWithoutVoice | None = None, done_webhook: Webhook | None = None)¶
Parameters necessary to generate a voice.
- name¶
A name to give the voice, may be any string, and does not need to be unique.
- Type:
str
- gender¶
The gender of this voice.
- description¶
A description of this voice.
- Type:
str | None
- default_style¶
An optional list of styles to associate with this voice by default. It can be overriden by a take that uses this voice. Note that most styles are mutually exclusive, and not all models support styles.
- Type:
list[str] | None
- default_prosody¶
An optional default prosody to associate with this voice. It can be overridden by a take that uses this voice.
- example_take¶
Parameters for an example take to generate for this voice. If not provided, a default example text will be used, depending on the language of the model.
- Type:
- done_webhook¶
An optional URL to call using
POST
when the voice is available, with the response of VoiceInfo in the body content. This shall be called once, after the voice and example take have been generated.- Type:
- model_config: ClassVar[ConfigDict] = {}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class daisys.v1.speak.models.VoiceInfo(*, name: str, model: str, gender: VoiceGender, description: str | None = None, default_style: list[str] | None = None, default_prosody: SimpleProsody | AffectProsody | SignalProsody | None = None, example_take: TakeGenerateWithoutVoice | None = None, done_webhook: Webhook | None = None, voice_id: str, status: Status, timestamp_ms: int, example_take_id: str | None = None)¶
Information about a voice.
- voice_id¶
The unique identifier of this voice.
- Type:
str
- status¶
The status of this voice, whether it is ready, in error, or in progress.
- timestamp_ms¶
The timestamp that this voice generation was requested, in milliseconds since epoch.
- Type:
int
- example_take_id¶
An optional identifier for a take that represents an example of this voice.
- Type:
str | None
- model_config: ClassVar[ConfigDict] = {}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class daisys.v1.speak.models.VoiceUpdate(*, name: str | None = None, gender: VoiceGender | None = None, default_style: list[str] | None = None, default_prosody: SimpleProsody | AffectProsody | SignalProsody | None = None)¶
Update parameters of a voice.
- name¶
A name to give the voice, may be any string, and does not need to be unique.
- Type:
str | None
- gender¶
The gender of this voice.
- Type:
- default_style¶
An optional list of styles to associate with this voice by default. It can be overriden by a take that uses this voice. Note that most styles are mutually exclusive, and not all models support styles.
- Type:
list[str] | None
- default_prosody¶
An optional default prosody to associate with this voice. It can be overridden by a take that uses this voice.
- model_config: ClassVar[ConfigDict] = {}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class daisys.v1.speak.models.Webhook(*, post_url: str, timestamp_ms: int | None = None, status_code: int | None = None)¶
Store information about a registered webhook and its status.
When specifying a webhook, only
url
needs to be provided.- post_url¶
The URL to be called with POST.
- Type:
str
- timestamp_ms¶
The time it was last called at, milliseconds since epoch.
- Type:
int | None
- status_code¶
The HTTP status code of the last response from the webhook.
- Type:
int | None
- model_config: ClassVar[ConfigDict] = {}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].