Daisys API JSON models

Pydantic classes representing the JSON interface for the Daisys API.

class daisys.v1.speak.models.AffectProsody(*, pitch: int, pace: int, valence: int, dominance: int, arousal: int)

Prosody features based on analysis of affect. See also parent class ProsodyFeatures for other fields.

valence

The valence; -10 for negativity, 10 for positivity, 0 for neutral.

Type:

int

arousal

The arousal; -10 for unexcited, 10 for very excited, 0 for neutral.

Type:

int

dominance

The dominance; -10 for docile, 10 for commanding, 0 for neutral.

Type:

int

model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class daisys.v1.speak.models.ProsodyFeatures(*, pitch: int, pace: int)

Base prosody features supported by all models.

pitch

The normalized pitch; -10 to 10, where 0 is a neutral pitch.

Type:

int

pace

The normalized pace; -10 to 10, where 0 is a neutral pace.

Type:

int

model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

daisys.v1.speak.models.ProsodyFeaturesUnion

A union type representing different prosody feature variations.

alias of SimpleProsody | AffectProsody | SignalProsody

class daisys.v1.speak.models.ProsodyType(value)

An enum representing different prosody feature types.

Not all models accept all prosody types. See the prosody_types field of TTSModel.

SIMPLE

corresponds with SimpleProsody

AFFECT

corresponds with AffectProsody

SIGNAL

corresponds with SignalProsody

static from_class(prosody: SimpleProsody | AffectProsody | SignalProsody)

Return an enum value based on the prosody class provided.

Parameters:

prosody – The prosody object from which to derive the enum value.

prosody(**kwargs)

Return a prosody object corresponding to this value, initialized with the given arguments.

class daisys.v1.speak.models.SignalProsody(*, pitch: int, pace: int, tilt: int, pitch_range: int)

Prosody features based on signal analysis. See also parent class ProsodyFeatures for other fields.

tilt

The normalized spectral tilt; -10 for flat, 10 for bright, 0 for neutral.

Type:

int

pitch_range

The normalized pitch range; -10 for flat, 10 for highly varied pitch, 0 for neutral.

Type:

int

model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class daisys.v1.speak.models.SimpleProsody(*, pitch: int, pace: int, expression: int)

Simplified prosody features, supported by all models. See also parent class ProsodyFeatures for other fields.

expression

The normalized “expression”; -10 to 10, where 0 is neutral.

Type:

int

model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class daisys.v1.speak.models.Status(value)

Represents the status of a take or voice generation process.

WAITING

Item is waiting to be processed.

STARTED

Processing has started for this item.

PROGRESS_25

Item has been 25% processed.

PROGRESS_50

Item has been 50% procesesd.

PROGRESS_75

Item has been 75% procesesd.

READY

Item is ready to be used; for takes, audio is available.

ERROR

An error occurred during processing of this item.

TIMEOUT

Processing did not finish for this item.

Note that TIMEOUT is used for very long intervals; it does not indicate a few seconds or minutes, but rather that an item has been in the queue for more than a day and has therefore been removed. It should only be considered to represent circumstances where processing errors were not detected by normal means.

class daisys.v1.speak.models.StreamMode(value)

Whether a websocket messages should contain a whole part or chunks of parts.

Note: upper case in Python, lower case in JSON.

Values:

PARTS, CHUNKS

class daisys.v1.speak.models.StreamOptions(*, mode: StreamMode = StreamMode.PARTS)

Options for streaming.

mode

The streaming mode to use.

Type:

daisys.v1.speak.models.StreamMode

model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class daisys.v1.speak.models.TTSModel(*, name: str, displayname: str, flags: list[str] = [], languages: list[str], genders: list[VoiceGender], styles: list[list[str]] = [], prosody_types: list[ProsodyType], voice_inputs: list[VoiceInputType] | None)

Information about a speech model.

name

The unique identifier of this model.

Type:

str

displayname

A friendlier name that might contain spaces.

Type:

str

flags

A list of flags that indicate some features of this model.

Type:

list[str]

languages

A list of languages supported by this model.

Type:

list[str]

genders

A list of genders supported by this model.

Type:

list[daisys.v1.speak.models.VoiceGender]

styles

A list of style sets; each sublist is a list of mutually exlusive style tags.

Type:

list[list[str]]

prosody_types

A list of which prosody types are supported by this model.

Type:

list[daisys.v1.speak.models.ProsodyType]

voice_inputs

A list of which voice input types are supported by this model.

Type:

list[daisys.v1.speak.models.VoiceInputType] | None

model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class daisys.v1.speak.models.TakeGenerate(*, text: str, override_language: str | None = None, style: list[str] | None = None, prosody: SimpleProsody | AffectProsody | SignalProsody | None = None, status_webhook: Webhook | None = None, done_webhook: Webhook | None = None, voice_id: str)

Parameters necessary to generate a “take”, an audio file containing an utterance of the given text by the given voice. See TakeGenerateWithoutVoice for documentation on the remaining fields.

voice_id

The id of the voice to be used for generating audio. The voice is attached to a specific model.

Type:

str

model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class daisys.v1.speak.models.TakeGenerateWithoutVoice(*, text: str, override_language: str | None = None, style: list[str] | None = None, prosody: SimpleProsody | AffectProsody | SignalProsody | None = None, status_webhook: Webhook | None = None, done_webhook: Webhook | None = None)

Parameters necessary to generate a “take”, an audio file containing an utterance of the given text. No voice is provided here, for the purpose of embedding in VoiceGenerate for the voice example.

text

The text that the voice should say.

Type:

str

override_language

Normally a language classifier is used to detect the language of the speech; this allows for multilingual sentences. However, if the language should be enforced, it should be provided here. Currently accepted values are “nl-NL” and “en-GB”.

Type:

str | None

style

A list of styles to enable when speaking. Note that most styles are mutually exclusive, so a list of 1 value should be provided. Accepted styles can be retrieved from the associated voice’s VoiceInfo.styles or the model’s TTSModel.styles field. Note that not all models support styles, thus this can be left empty if specific styles are not desired.

Type:

list[str] | None

prosody

The characteristics of the desired speech not determined by the voice or style. Here you can provide a SimpleProsody or most models also accept the more detailed AffectProsody.

Type:

daisys.v1.speak.models.SimpleProsody | daisys.v1.speak.models.AffectProsody | daisys.v1.speak.models.SignalProsody | None

status_webhook

An optional URL to be called using POST whenever the take’s status changes, with TakeResponse in the body content.

Type:

daisys.v1.speak.models.Webhook | None

done_webhook

An optional URL to be called exactly once using POST when the take is READY, ERROR, or TIMEOUT, with TakeResponse in the body content.

Type:

daisys.v1.speak.models.Webhook | None

model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class daisys.v1.speak.models.TakeInfo(*, duration: int, audio_rate: int, normalized_text: list[str])

Some information available when a take is READY, attached to the TakeResponse.

duration

The length of the audio in samples. To get the length in seconds, divide by audio_rate.

Type:

int

audio_rate

The number of samples per second in the audio.

Type:

int

normalized_text

The text used for text-to-speech after normalization, ie. translated from “as written” to “as spoken”. Provided as a list of sentences.

Type:

list[str]

model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class daisys.v1.speak.models.TakeResponse(*, text: str, override_language: str | None = None, style: list[str] | None = None, prosody: SimpleProsody | AffectProsody | SignalProsody | None = None, status_webhook: Webhook | None = None, done_webhook: Webhook | None = None, voice_id: str, take_id: str, status: Status, timestamp_ms: int, info: TakeInfo | None = None)

Information about a take, returned during and after take generation. Also includes fields from TakeGenerate.

take_id

The unique identifier of this take.

Type:

str

status

The status of this take, whether it is ready, in error, or in progress.

Type:

daisys.v1.speak.models.Status

timestamp_ms

The timestamp that this take generation was requested, in milliseconds since epoch.

Type:

int

info

Information available when the take is READY, see TakeInfo.

Type:

daisys.v1.speak.models.TakeInfo | None

model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class daisys.v1.speak.models.Version(*, version: int, minor: int)

Represents the version of the API.

version

The major version number of the API.

Type:

int

minor

The minor version number of the API.

Type:

int

model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class daisys.v1.speak.models.VoiceGender(value)

Represents the gender of a voice.

Note: upper case in Python, lower case in JSON.

Values:

MALE, FEMALE, NONBINARY

class daisys.v1.speak.models.VoiceGenerate(*, name: str, model: str, gender: VoiceGender, description: str | None = None, default_style: list[str] | None = None, default_prosody: SimpleProsody | AffectProsody | SignalProsody | None = None, example_take: TakeGenerateWithoutVoice | None = None, done_webhook: Webhook | None = None)

Parameters necessary to generate a voice.

name

A name to give the voice, may be any string, and does not need to be unique.

Type:

str

model

The name of the model for this voice. Refers to the name entry in TTSModel.

Type:

str

gender

The gender of this voice.

Type:

daisys.v1.speak.models.VoiceGender

description

A description of this voice.

Type:

str | None

default_style

An optional list of styles to associate with this voice by default. It can be overriden by a take that uses this voice. Note that most styles are mutually exclusive, and not all models support styles.

Type:

list[str] | None

default_prosody

An optional default prosody to associate with this voice. It can be overridden by a take that uses this voice.

Type:

daisys.v1.speak.models.SimpleProsody | daisys.v1.speak.models.AffectProsody | daisys.v1.speak.models.SignalProsody | None

example_take

Parameters for an example take to generate for this voice. If not provided, a default example text will be used, depending on the language of the model.

Type:

daisys.v1.speak.models.TakeGenerateWithoutVoice | None

done_webhook

An optional URL to call using POST when the voice is available, with the response of VoiceInfo in the body content. This shall be called once, after the voice and example take have been generated.

Type:

daisys.v1.speak.models.Webhook | None

model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class daisys.v1.speak.models.VoiceInfo(*, name: str, model: str, gender: VoiceGender, description: str | None = None, default_style: list[str] | None = None, default_prosody: SimpleProsody | AffectProsody | SignalProsody | None = None, example_take: TakeGenerateWithoutVoice | None = None, done_webhook: Webhook | None = None, voice_id: str, status: Status, timestamp_ms: int, example_take_id: str | None = None)

Information about a voice.

voice_id

The unique identifier of this voice.

Type:

str

status

The status of this voice, whether it is ready, in error, or in progress.

Type:

daisys.v1.speak.models.Status

timestamp_ms

The timestamp that this voice generation was requested, in milliseconds since epoch.

Type:

int

example_take_id

An optional identifier for a take that represents an example of this voice.

Type:

str | None

model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class daisys.v1.speak.models.VoiceUpdate(*, name: str | None = None, gender: VoiceGender | None = None, default_style: list[str] | None = None, default_prosody: SimpleProsody | AffectProsody | SignalProsody | None = None)

Update parameters of a voice.

name

A name to give the voice, may be any string, and does not need to be unique.

Type:

str | None

gender

The gender of this voice.

Type:

daisys.v1.speak.models.VoiceGender | None

default_style

An optional list of styles to associate with this voice by default. It can be overriden by a take that uses this voice. Note that most styles are mutually exclusive, and not all models support styles.

Type:

list[str] | None

default_prosody

An optional default prosody to associate with this voice. It can be overridden by a take that uses this voice.

Type:

daisys.v1.speak.models.SimpleProsody | daisys.v1.speak.models.AffectProsody | daisys.v1.speak.models.SignalProsody | None

model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class daisys.v1.speak.models.Webhook(*, post_url: str, timestamp_ms: int | None = None, status_code: int | None = None)

Store information about a registered webhook and its status.

When specifying a webhook, only url needs to be provided.

post_url

The URL to be called with POST.

Type:

str

timestamp_ms

The time it was last called at, milliseconds since epoch.

Type:

int | None

status_code

The HTTP status code of the last response from the webhook.

Type:

int | None

model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].