Daisys API JSON models¶

Pydantic classes representing the JSON interface for the Daisys API.

class daisys.v1.speak.models.AffectProsody(*, pitch: int, pace: int, valence: int, dominance: int, arousal: int)¶

Prosody features based on analysis of affect. See also parent class ProsodyFeatures for other fields.

valence¶

The valence; -10 for negativity, 10 for positivity, 0 for neutral.

Type:: int

arousal¶

The arousal; -10 for unexcited, 10 for very excited, 0 for neutral.

Type:: int

dominance¶

The dominance; -10 for docile, 10 for commanding, 0 for neutral.

Type:: int

model_config: ClassVar[ConfigDict] = {}¶: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class daisys.v1.speak.models.ProsodyFeatures(*, pitch: int, pace: int)¶

Base prosody features supported by all models.

pitch¶

The normalized pitch; -10 to 10, where 0 is a neutral pitch.

Type:: int

pace¶

The normalized pace; -10 to 10, where 0 is a neutral pace.

Type:: int

model_config: ClassVar[ConfigDict] = {}¶: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

daisys.v1.speak.models.ProsodyFeaturesUnion¶

A union type representing different prosody feature variations.

alias of SimpleProsody | AffectProsody | SignalProsody

class daisys.v1.speak.models.ProsodyType(value)¶

An enum representing different prosody feature types.

Not all models accept all prosody types. See the prosody_types field of TTSModel.

SIMPLE¶: corresponds with SimpleProsody

AFFECT¶: corresponds with AffectProsody

SIGNAL¶: corresponds with SignalProsody

static from_class(prosody: SimpleProsody | AffectProsody | SignalProsody)¶

Return an enum value based on the prosody class provided.

Parameters:: prosody – The prosody object from which to derive the enum value.

prosody(**kwargs)¶: Return a prosody object corresponding to this value, initialized with the given arguments.

class daisys.v1.speak.models.SignalProsody(*, pitch: int, pace: int, tilt: int, pitch_range: int)¶

Prosody features based on signal analysis. See also parent class ProsodyFeatures for other fields.

tilt¶

The normalized spectral tilt; -10 for flat, 10 for bright, 0 for neutral.

Type:: int

pitch_range¶

The normalized pitch range; -10 for flat, 10 for highly varied pitch, 0 for neutral.

Type:: int

model_config: ClassVar[ConfigDict] = {}¶: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class daisys.v1.speak.models.SimpleProsody(*, pitch: int, pace: int, expression: int)¶

Simplified prosody features, supported by all models. See also parent class ProsodyFeatures for other fields.

expression¶

The normalized “expression”; -10 to 10, where 0 is neutral.

Type:: int

model_config: ClassVar[ConfigDict] = {}¶: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class daisys.v1.speak.models.Status(value)¶

Represents the status of a take or voice generation process.

WAITING¶: Item is waiting to be processed.

STARTED¶: Processing has started for this item.

PROGRESS_25¶: Item has been 25% processed.

PROGRESS_50¶: Item has been 50% procesesd.

PROGRESS_75¶: Item has been 75% procesesd.

READY¶: Item is ready to be used; for takes, audio is available.

ERROR¶: An error occurred during processing of this item.

TIMEOUT¶: Processing did not finish for this item.

Note that TIMEOUT is used for very long intervals; it does not indicate a few seconds or minutes, but rather that an item has been in the queue for more than a day and has therefore been removed. It should only be considered to represent circumstances where processing errors were not detected by normal means.

class daisys.v1.speak.models.StreamMode(value)¶

Whether a websocket messages should contain a whole part or chunks of parts.

Note: upper case in Python, lower case in JSON.

Values:: PARTS, CHUNKS

class daisys.v1.speak.models.StreamOptions(*, mode: StreamMode = StreamMode.PARTS)¶

Options for streaming.

mode¶

The streaming mode to use.

Type:: daisys.v1.speak.models.StreamMode

model_config: ClassVar[ConfigDict] = {}¶: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class daisys.v1.speak.models.TTSModel(*, name: str, displayname: str, flags: list[str] = [], languages: list[str], genders: list[VoiceGender], styles: list[list[str]] = [], prosody_types: list[ProsodyType], voice_inputs: list[VoiceInputType] | None)¶

Information about a speech model.

name¶

The unique identifier of this model.

Type:: str

displayname¶

A friendlier name that might contain spaces.

Type:: str

flags¶

A list of flags that indicate some features of this model.

Type:: list[str]

languages¶

A list of languages supported by this model.

Type:: list[str]

genders¶

A list of genders supported by this model.

Type:: list[daisys.v1.speak.models.VoiceGender]

styles¶

A list of style sets; each sublist is a list of mutually exlusive style tags.

Type:: list[list[str]]

prosody_types¶

A list of which prosody types are supported by this model.

Type:: list[daisys.v1.speak.models.ProsodyType]

voice_inputs¶

A list of which voice input types are supported by this model.

Type:: list[daisys.v1.speak.models.VoiceInputType] | None

model_config: ClassVar[ConfigDict] = {}¶: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class daisys.v1.speak.models.TakeGenerate(*, text: str, override_language: str | None = None, style: list[str] | None = None, prosody: SimpleProsody | AffectProsody | SignalProsody | None = None, status_webhook: Webhook | None = None, done_webhook: Webhook | None = None, user_data: Annotated[str, StringConstraints(strip_whitespace=None, to_upper=None, to_lower=None, strict=None, min_length=None, max_length=256, pattern=None)] | int | float | None = None, voice_id: str)¶

Parameters necessary to generate a “take”, an audio file containing an utterance of the given text by the given voice. See TakeGenerateWithoutVoice for documentation on the remaining fields.

voice_id¶

The id of the voice to be used for generating audio. The voice is attached to a specific model.

Type:: str

model_config: ClassVar[ConfigDict] = {}¶: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class daisys.v1.speak.models.TakeGenerateWithoutVoice(*, text: str, override_language: str | None = None, style: list[str] | None = None, prosody: SimpleProsody | AffectProsody | SignalProsody | None = None, status_webhook: Webhook | None = None, done_webhook: Webhook | None = None, user_data: Annotated[str, StringConstraints(strip_whitespace=None, to_upper=None, to_lower=None, strict=None, min_length=None, max_length=256, pattern=None)] | int | float | None = None)¶

Parameters necessary to generate a “take”, an audio file containing an utterance of the given text. No voice is provided here, for the purpose of embedding in VoiceGenerate for the voice example.

text¶

The text that the voice should say.

Type:: str

override_language¶

Normally a language classifier is used to detect the language of the speech; this allows for multilingual sentences. However, if the language should be enforced, it should be provided here. Currently accepted values are “nl-NL” and “en-GB”.

Type:: str | None

style¶

A list of styles to enable when speaking. Note that most styles are mutually exclusive, so a list of 1 value should be provided. Accepted styles can be retrieved from the associated voice’s VoiceInfo.styles or the model’s TTSModel.styles field. Note that not all models support styles, thus this can be left empty if specific styles are not desired.

Type:: list[str] | None

prosody¶

The characteristics of the desired speech not determined by the voice or style. Here you can provide a SimpleProsody or most models also accept the more detailed AffectProsody.

Type:: daisys.v1.speak.models.SimpleProsody | daisys.v1.speak.models.AffectProsody | daisys.v1.speak.models.SignalProsody | None

status_webhook¶

An optional URL to be called using POST whenever the take’s status changes, with TakeResponse in the body content.

Type:: daisys.v1.speak.models.Webhook | None

done_webhook¶

An optional URL to be called exactly once using POST when the take is READY, ERROR, or TIMEOUT, with TakeResponse in the body content.

Type:: daisys.v1.speak.models.Webhook | None

user_data¶

An optional string (max 256 chars) or numerical value that can be attached to a take for use in user applications; for example, storing video timestamps, sentence index, or external database keys.

Type:: str | int | float | None

model_config: ClassVar[ConfigDict] = {}¶: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class daisys.v1.speak.models.TakeInfo(*, duration: int, audio_rate: int, normalized_text: list[str])¶

Some information available when a take is READY, attached to the TakeResponse.

duration¶

The length of the audio in samples. To get the length in seconds, divide by audio_rate.

Type:: int

audio_rate¶

The number of samples per second in the audio.

Type:: int

normalized_text¶

The text used for text-to-speech after normalization, ie. translated from “as written” to “as spoken”. Provided as a list of sentences.

Type:: list[str]

model_config: ClassVar[ConfigDict] = {}¶: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class daisys.v1.speak.models.TakeResponse(*, text: str, override_language: str | None = None, style: list[str] | None = None, prosody: SimpleProsody | AffectProsody | SignalProsody | None = None, status_webhook: Webhook | None = None, done_webhook: Webhook | None = None, user_data: Annotated[str, StringConstraints(strip_whitespace=None, to_upper=None, to_lower=None, strict=None, min_length=None, max_length=256, pattern=None)] | int | float | None = None, voice_id: str, take_id: str, status: Status, timestamp_ms: int, info: TakeInfo | None = None)¶

Information about a take, returned during and after take generation. Also includes fields from TakeGenerate.

take_id¶

The unique identifier of this take.

Type:: str

status¶

The status of this take, whether it is ready, in error, or in progress.

Type:: daisys.v1.speak.models.Status

timestamp_ms¶

The timestamp that this take generation was requested, in milliseconds since epoch.

Type:: int

info¶

Information available when the take is READY, see TakeInfo.

Type:: daisys.v1.speak.models.TakeInfo | None

model_config: ClassVar[ConfigDict] = {}¶: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class daisys.v1.speak.models.Version(*, version: int, minor: int)¶

Represents the version of the API.

version¶

The major version number of the API.

Type:: int

minor¶

The minor version number of the API.

Type:: int

model_config: ClassVar[ConfigDict] = {}¶: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class daisys.v1.speak.models.VoiceGender(value)¶

Represents the gender of a voice.

Note: upper case in Python, lower case in JSON.

Values:: MALE, FEMALE, NONBINARY

class daisys.v1.speak.models.VoiceGenerate(*, name: str, model: str, gender: VoiceGender, description: str | None = None, default_style: list[str] | None = None, default_prosody: SimpleProsody | AffectProsody | SignalProsody | None = None, example_take: TakeGenerateWithoutVoice | None = None, done_webhook: Webhook | None = None)¶

Parameters necessary to generate a voice.

name¶

A name to give the voice, may be any string, and does not need to be unique.

Type:: str

model¶

The name of the model for this voice. Refers to the name entry in TTSModel.

Type:: str

gender¶

The gender of this voice.

Type:: daisys.v1.speak.models.VoiceGender

description¶

A description of this voice.

Type:: str | None

default_style¶

An optional list of styles to associate with this voice by default. It can be overriden by a take that uses this voice. Note that most styles are mutually exclusive, and not all models support styles.

Type:: list[str] | None

default_prosody¶

An optional default prosody to associate with this voice. It can be overridden by a take that uses this voice.

Type:: daisys.v1.speak.models.SimpleProsody | daisys.v1.speak.models.AffectProsody | daisys.v1.speak.models.SignalProsody | None

example_take¶

Parameters for an example take to generate for this voice. If not provided, a default example text will be used, depending on the language of the model.

Type:: daisys.v1.speak.models.TakeGenerateWithoutVoice | None

done_webhook¶

An optional URL to call using POST when the voice is available, with the response of VoiceInfo in the body content. This shall be called once, after the voice and example take have been generated.

Type:: daisys.v1.speak.models.Webhook | None

model_config: ClassVar[ConfigDict] = {}¶: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class daisys.v1.speak.models.VoiceInfo(*, name: str, model: str, gender: VoiceGender, description: str | None = None, default_style: list[str] | None = None, default_prosody: SimpleProsody | AffectProsody | SignalProsody | None = None, example_take: TakeGenerateWithoutVoice | None = None, done_webhook: Webhook | None = None, voice_id: str, status: Status, timestamp_ms: int, example_take_id: str | None = None)¶

Information about a voice.

voice_id¶

The unique identifier of this voice.

Type:: str

status¶

The status of this voice, whether it is ready, in error, or in progress.

Type:: daisys.v1.speak.models.Status

timestamp_ms¶

The timestamp that this voice generation was requested, in milliseconds since epoch.

Type:: int

example_take_id¶

An optional identifier for a take that represents an example of this voice.

Type:: str | None

model_config: ClassVar[ConfigDict] = {}¶: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

Update parameters of a voice.

name¶

A name to give the voice, may be any string, and does not need to be unique.

Type:: str | None

gender¶

The gender of this voice.

Type:: daisys.v1.speak.models.VoiceGender | None

default_style¶

Type:: list[str] | None

default_prosody¶

An optional default prosody to associate with this voice. It can be overridden by a take that uses this voice.

Type:: daisys.v1.speak.models.SimpleProsody | daisys.v1.speak.models.AffectProsody | daisys.v1.speak.models.SignalProsody | None

model_config: ClassVar[ConfigDict] = {}¶: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class daisys.v1.speak.models.Webhook(*, post_url: str, timestamp_ms: int | None = None, status_code: int | None = None)¶

Store information about a registered webhook and its status.

When specifying a webhook, only url needs to be provided.

post_url¶

The URL to be called with POST.

Type:: str

timestamp_ms¶

The time it was last called at, milliseconds since epoch.

Type:: int | None

status_code¶

The HTTP status code of the last response from the webhook.

Type:: int | None

model_config: ClassVar[ConfigDict] = {}¶: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].