Getting started with the Python client library¶

For more on using the Python client library, see Daisys API examples.

In the following, the default “synchronous” client will be demonstrated. Some users will prefer to use asyncio, and in the following examples, with DaisysAPI() can be replaced with async with DaisysAPI, which returns an asynchronous client library that can be used with the await keyword.

Installing the library¶

The library is available on pypi.org and can be installed via pip. The Daisys API requires Python version 3.10 or greater. First create a Python venv, activate it, install the library, and then download and run the examples:

Installing the library¶

$ # setup a virtual environment
$ mkdir daisys_project
$ cd daisys_project
$ python3 -m venv venv
$ . venv/bin/activate
$
$ # install the library
$ python3 -m pip daisys
$
$ # or if Python websocket support is needed
$ python3 -m pip daisys[ws]

Of course pip is only one option, you can use any Python project management software such as uv, Poetry, etc.

Running an example¶

Within the Python virtual environment, the hello_daisys.py example can be run. The examples are programmed to take your email and password in the environment variables as shown:

Running an example¶

$ curl -O https://raw.githubusercontent.com/daisys-ai/daisys-api-python/main/examples/hello_daisys.py
$ export DAISYS_EMAIL=user@example.com
$ export DAISYS_PASSWORD=example_password123
$ python3 hello_daisys.py

Getting a client¶

Getting a client by context manager¶

from daisys import DaisysAPI
with DaisysAPI('speak', email='user@example.com', password='pw') as speak:
    ...

# or for asyncio support:
async with DaisysAPI('speak', email='user@example.com', password='pw') as speak:
    ...

As mentioned, an asyncio-enabled client can be instantiated by using async with in the line above. Additionally, the context manager interface (with) is optional; it is also possible to create a client by normal function call:

Getting a client by function¶

from daisys import DaisysAPI
speak = DaisysAPI('speak', email=EMAIL, password=PASSWORD).get_client()
# or..
speak = DaisysAPI('speak', email=EMAIL, password=PASSWORD).get_async_client()

The main difference is that when an email and password are used, the context manager approach will automatically log out when the program exits the context, whereas when the client is retrieved by get_client or get_async_client, then .logout() function should be called. Logging out invalidates the refresh token so that no further sessions can be renewed without logging in again. Auto-logout will not occur when an access token is provided.

The rest of this documentation will assume the normal, synchronous client. In all cases, functions should be called with await when used with the asyncio client.

Listing the models¶

Using the client library, it is easy to log into the API and start requesting text to speech services. The following Python code can be used to list the available models:

Listing the models¶

from daisys import DaisysAPI
with DaisysAPI('speak', email='user@example.com', password='pw') as speak:
    print('Found models:')
    for model in speak.get_models():
        print(model)

Listing the voices¶

You can use a model by using a voice associated with that model. Voices are identified by a voice_id field.

Listing the voices¶

from daisys import DaisysAPI
with DaisysAPI('speak', email='user@example.com', password='pw') as speak:
    print('Found voices:')
    for voice in speak.get_voices():
        print(f'{voice.name}, a {voice.gender} voice of {voice.model} with id {voice.voice_id}.')

Generating a voice¶

If you do not yet have any voices, you should generate one. Voices can be requested for a given gender and with default prosody information. Voices must be given names.

For instance, the following block of code creates an expressive female voice for the shakespeare model:

Generating a voice¶

from daisys import DaisysAPI, VoiceGender
from pprint import pprint
with DaisysAPI('speak', email='user@example.com', password='pw') as speak:
    print('Creating a voice:')
    voice = speak.generate_voice(name="Deirdre", gender=VoiceGender.FEMALE, model="shakespeare")
    pprint(voice.model_dump())

Note that voice generation can take a few seconds! In this example, the speak.generate_voice command waits for the operation to finish, and therefore we can print the result immediately.

It is also possible to adopt a more asynchronous style by providing wait=False to speak.generate_voice(). Alternatively, as mentioned above you can use the asyncio client to allow the await speak.generate_voice() syntax.

The above code gives the following details:

Generating a voice: output¶

Creating a voice:
{'default_style': [],
 'default_prosody': None,
 'done_webhook': None,
 'example_take': None,
 'example_take_id': 't01hasgezqkx4vth62xckymk3x3',
 'gender': <VoiceGender.FEMALE: 'female'>,
 'model': 'shakespeare',
 'name': 'Deirdre',
 'status': <Status.READY: 'ready'>,
 'timestamp_ms': 1695218371261,
 'voice_id': 'v01hasgezqjcsnc91zdfzpx0apj'}

We can see that the voice has a female gender, and has an example take associated with it. This take_id can already be used to hear the voice.

Generating a take¶

Now that you have a voice, text to speech can be requested by the speak.take_generate() command:

Generating a take¶

from daisys import DaisysAPI
from pprint import pprint
with DaisysAPI('speak', email='user@example.com', password='pw') as speak:
    print('Creating a take:')
    take = speak.generate_take(voice_id='v01hasgezqjcsnc91zdfzpx0apj',
                               text="Hello, Daisys! It's a beautiful day.")
    pprint(take.model_dump())

Giving,

Generating a take: output¶

Creating a take:
{'done_webhook': None,
 'info': {'audio_rate': 44100,
          'duration': 152576,
          'normalized_text': ['Hello, Daisys!', "It's a beautiful day."]},
 'override_language': None,
 'prosody': None,
 'status': <Status.READY: 'ready'>,
 'status_webhook': None,
 'style': None,
 'take_id': 't01hasgn2dnyg6jqrcym9cgxv75',
 'text': "Hello, Daisys! It's a beautiful day.",
 'timestamp_ms': 1695220926901,
 'voice_id': 'v01hasgezqjcsnc91zdfzpx0apj'}

Note that the status is “ready”, meaning that audio can now be retrieved. As with voice generation, an asynchronous approach is also available for generate_take.

Retrieving a take’s audio¶

The take is ready, now we can hear the result! Audio for a take can be retrieved as follows:

Retrieving audio (1)¶

from daisys import DaisysAPI
with DaisysAPI('speak', email='user@example.com', password='pw') as speak:
    print("Getting a take's audio.")
    audio_wav = speak.get_take_audio(take_id='t01hasghx0zgdc29gpzexw5r8wc', file='beautiful_day.wav')
    print('Length in bytes:', len(audio_wav))

In the above code, we retrive a .wav file, which is (optiionally) written to a file in addition to being returned. This can be decoded for example using scipy’s io.wavfile module:

Retrieving audio (2)¶

    from scipy.io import wavfile
    from io import BytesIO
    print(wavfile.read(BytesIO(audio_wav)))

    # Note: Since decoding the audio is outside the scope of the client library,
    # `scipy` is not a dependency and will not be automatically installed by `pip`.

which, along with the previous code block, prints:

Retrieving audio: output¶

Getting a take's audio.
Length in bytes: 292908
(44100, array([-111,  -46, -104, ..., -128,  -95,   -9], dtype=int16))

The resulting file beautiful_day.wav can be played using command line programs like aplay on Linux, or any audio player such as the excellent VLC. You can integrate the results into your creative projects!

It is also possible to retrieve the audio in other formats: mp3, flac, and m4a by providing the format parameter.

Streaming audio¶

The Daisys API supports two methods of streaming audio:

HTTP
Websocket

HTTP¶

The HTTP method downloads the audio file in chunks using a streaming response, and can be convenient if a simple iterator interface is desired. When making the take request, set wait to False, and call stream_take_audio() (async stream_take_audio()). Alternatively a signed URL can be retrieved using get_take_audio_url() (async get_take_audio_url()), useful for passing to an audio playing running on a frontend browser.

Streaming audio, HTTP method¶

from daisys import DaisysAPI
with DaisysAPI('speak', email='user@example.com', password='pw') as speak:
    print("Streaming a take's audio.")
    with speak.stream_take_audio(take_id='t01hasghx0zgdc29gpzexw5r8wc') as stream:
        for chunk in stream:
            print('Length in bytes:', len(chunk))

When using the HTTP method via endpoints outside of the Python library, please be aware of the use of 307 redirects and headers, outlined in Retrieving audio.

Websocket¶

See Daisys API websocket examples.

For lowest latency usage, it is additionally possible to use a websocket to create a connection directly to the worker node used for synthesizing audio. Requests are submitted to the worker and the same node streams back the audio as it is generated over the already-established connection.

Streaming audio, websocket method¶

from daisys import DaisysAPI
with DaisysAPI('speak', email='user@example.com', password='pw') as speak:
    print("Streaming a take's audio.")
    with speak.websocket(voice_id='v01hasgezqjcsnc91zdfzpx0apj') as ws:
        request_id = ws.generate_take(voice_id='v01hasgezqjcsnc91zdfzpx0apj',
                                      text="Hello, Daisys! It's a beautiful day.",
                                      audio_callback=my_audio_cb,
                                      status_callback=my_status_cb)

The specified callbacks will be called whenever the requested take’s status changes or audio data is generated. See Example: Websocket example, synchronous client for complete information on the signatures of these two callbacks and examples showing how they can be used to receive audio in chunks as it is generated.

In addition to the callback interface, iter_request() (async iter_request()) is provided to allow an iterator-based for-loop (or async for-loop) over incoming audio chunks, simplifying usage.

Finally, in applications where the backend should perform REST API calls but the front-end should stream audio, websocket_url() can be used to retrieve a URL that the front-end should connect a websocket to. Example: Websocket example, web client is provided to show how to manage the websocket connection using JavaScript.

Authentication with access tokens¶

All the above examples authenticate with the API using email and password. In some scenarios users will prefer to authenticate using only the access token. An access and refresh token can be retrieved once and used until it is manually revoked.

By default, when the client library is used with email and password, the refresh token is automatically revoked when the client context is exited. When an access token is provided to the client context, this automatic revocation is skipped, so that the token can be refreshed on next usage. This can be controlled by setting speak.auto_logout to True or False.

To retrieve an access and refresh token for future use, the following program can thus be used:

Retrieving an access and refresh token¶

from daisys import DaisysAPI
with DaisysAPI('speak', email='user@example.com', password='pw') as speak:
    speak.auto_logout = False
    speak.login()
    access_token, refresh_token = speak.access_token, speak.refresh_token

These tokens can now be stored, and provided to the client as follows:

Retrieving an access and refresh token¶

from daisys import DaisysAPI

def store_tokens(speak, access_token: str, refresh_token: str):
    """Store the current Daisys access and refresh tokens."""
    with open('daisys_tokens.json','w') as token_file:
        json.dump([access_token, refresh_token], token_file)

access_token, refresh_token = json.load(open('daisys_tokens.json'))
with DaisysAPI('speak', access_token=access_token, refresh_token=refresh_token) as speak:
    speak.token_callback = store_tokens
    ...

The library does not implement a storage and retrieval mechanism for these tokens, as it is presumed that users will have their own files or databases for this purpose.

Importantly, when an access token expires, a new one will be automatically retrieved by the library. Therefore, it is useful to store speak.access_token and speak.refresh_token whenever it changes. The token_callback is provided for this purpose. It is optional, but recommended if not using a permatoken and one wishes to avoid transmitting passwords.