Getting started with the Python client library¶
For more on using the Python client library, see Daisys API examples.
In the following, the default “synchronous” client will be demonstrated. Some users will
prefer to use asyncio
, and in the following examples, with DaisysAPI()
can be
replaced with async with DaisysAPI
, which returns an asynchronous client library that
can be used with the await
keyword.
Installing the library¶
The library is available on pypi.org and can be installed via pip
. The Daisys API
requires Python version 3.10 or greater. First create a Python venv
, activate it,
install the library, and then download and run the examples:
1$ # setup a virtual environment
2$ mkdir daisys_project
3$ cd daisys_project
4$ python3 -m venv venv
5$ . venv/bin/activate
6$
7$ # install the library
8$ python3 -m pip daisys
9$
10$ # or if Python websocket support is needed
11$ python3 -m pip daisys[ws]
Of course pip
is only one option, you can use any Python project management
software such as uv
, Poetry, etc.
Running an example¶
Within the Python virtual environment, the hello_daisys.py
example can be run. The
examples are programmed to take your email and password in the environment variables as
shown:
1$ curl -O https://raw.githubusercontent.com/daisys-ai/daisys-api-python/main/examples/hello_daisys.py
2$ export DAISYS_EMAIL=user@example.com
3$ export DAISYS_PASSWORD=example_password123
4$ python3 hello_daisys.py
Getting a client¶
1from daisys import DaisysAPI
2with DaisysAPI('speak', email='user@example.com', password='pw') as speak:
3 ...
4
5# or for asyncio support:
6async with DaisysAPI('speak', email='user@example.com', password='pw') as speak:
7 ...
As mentioned, an asyncio
-enabled client can be instantiated by using async with
in
the line above. Additionally, the context manager interface (with
) is optional; it is
also possible to create a client by normal function call:
1from daisys import DaisysAPI
2speak = DaisysAPI('speak', email=EMAIL, password=PASSWORD).get_client()
3# or..
4speak = DaisysAPI('speak', email=EMAIL, password=PASSWORD).get_async_client()
The main difference is that when an email and password are used, the context manager
approach will automatically log out when the program exits the context, whereas when the
client is retrieved by get_client
or get_async_client
, then .logout()
function
should be called. Logging out invalidates the refresh token so that no further sessions
can be renewed without logging in again. Auto-logout will not occur when an access token
is provided.
The rest of this documentation will assume the normal, synchronous client. In all cases,
functions should be called with await
when used with the asyncio
client.
Listing the models¶
Using the client library, it is easy to log into the API and start requesting text to speech services. The following Python code can be used to list the available models:
1from daisys import DaisysAPI
2with DaisysAPI('speak', email='user@example.com', password='pw') as speak:
3 print('Found models:')
4 for model in speak.get_models():
5 print(model)
Listing the voices¶
You can use a model by using a voice associated with that model. Voices are identified by
a voice_id
field.
1from daisys import DaisysAPI
2with DaisysAPI('speak', email='user@example.com', password='pw') as speak:
3 print('Found voices:')
4 for voice in speak.get_voices():
5 print(f'{voice.name}, a {voice.gender} voice of {voice.model} with id {voice.voice_id}.')
Generating a voice¶
If you do not yet have any voices, you should generate one. Voices can be requested for a given gender and with default prosody information. Voices must be given names.
For instance, the following block of code creates an expressive female voice for the
shakespeare
model:
1from daisys import DaisysAPI, VoiceGender
2from pprint import pprint
3with DaisysAPI('speak', email='user@example.com', password='pw') as speak:
4 print('Creating a voice:')
5 voice = speak.generate_voice(name="Deirdre", gender=VoiceGender.FEMALE, model="shakespeare")
6 pprint(voice.model_dump())
Note that voice generation can take a few seconds! In this example, the
speak.generate_voice
command waits for the operation to finish, and therefore we can
print the result immediately.
It is also possible to adopt a more asynchronous style by providing wait=False
to
speak.generate_voice()
. Alternatively, as mentioned above you can use the asyncio
client to allow the await speak.generate_voice()
syntax.
The above code gives the following details:
1Creating a voice:
2{'default_style': [],
3 'default_prosody': None,
4 'done_webhook': None,
5 'example_take': None,
6 'example_take_id': 't01hasgezqkx4vth62xckymk3x3',
7 'gender': <VoiceGender.FEMALE: 'female'>,
8 'model': 'shakespeare',
9 'name': 'Deirdre',
10 'status': <Status.READY: 'ready'>,
11 'timestamp_ms': 1695218371261,
12 'voice_id': 'v01hasgezqjcsnc91zdfzpx0apj'}
We can see that the voice has a female gender, and has an example take associated with it.
This take_id
can already be used to hear the voice.
Generating a take¶
Now that you have a voice, text to speech can be requested by the
speak.take_generate()
command:
1from daisys import DaisysAPI
2from pprint import pprint
3with DaisysAPI('speak', email='user@example.com', password='pw') as speak:
4 print('Creating a take:')
5 take = speak.generate_take(voice_id='v01hasgezqjcsnc91zdfzpx0apj',
6 text="Hello, Daisys! It's a beautiful day.")
7 pprint(take.model_dump())
Giving,
1Creating a take:
2{'done_webhook': None,
3 'info': {'audio_rate': 44100,
4 'duration': 152576,
5 'normalized_text': ['Hello, Daisys!', "It's a beautiful day."]},
6 'override_language': None,
7 'prosody': None,
8 'status': <Status.READY: 'ready'>,
9 'status_webhook': None,
10 'style': None,
11 'take_id': 't01hasgn2dnyg6jqrcym9cgxv75',
12 'text': "Hello, Daisys! It's a beautiful day.",
13 'timestamp_ms': 1695220926901,
14 'voice_id': 'v01hasgezqjcsnc91zdfzpx0apj'}
Note that the status is “ready”, meaning that audio can now be retrieved. As with voice
generation, an asynchronous approach is also available for generate_take
.
Retrieving a take’s audio¶
The take is ready, now we can hear the result! Audio for a take can be retrieved as follows:
1from daisys import DaisysAPI
2with DaisysAPI('speak', email='user@example.com', password='pw') as speak:
3 print("Getting a take's audio.")
4 audio_wav = speak.get_take_audio(take_id='t01hasghx0zgdc29gpzexw5r8wc', file='beautiful_day.wav')
5 print('Length in bytes:', len(audio_wav))
In the above code, we retrive a .wav
file, which is (optiionally) written to a file in
addition to being returned. This can be decoded for example using scipy
’s
io.wavfile
module:
1 from scipy.io import wavfile
2 from io import BytesIO
3 print(wavfile.read(BytesIO(audio_wav)))
4
5 # Note: Since decoding the audio is outside the scope of the client library,
6 # `scipy` is not a dependency and will not be automatically installed by `pip`.
which, along with the previous code block, prints:
1Getting a take's audio.
2Length in bytes: 292908
3(44100, array([-111, -46, -104, ..., -128, -95, -9], dtype=int16))
The resulting file beautiful_day.wav
can be played using command line programs like
aplay
on Linux, or any audio player such as the excellent VLC. You can integrate
the results into your creative projects!
It is also possible to retrieve the audio in other formats: mp3
, flac
, and m4a
by providing the format
parameter.
Streaming audio¶
The Daisys API supports two methods of streaming audio:
HTTP
Websocket
HTTP¶
The HTTP method downloads the audio file in chunks using a streaming response,
and can be convenient if a simple iterator interface is desired. When making
the take request, set wait
to False
, and call
stream_take_audio()
(async
stream_take_audio()
).
Alternatively a signed URL can be retrieved using
get_take_audio_url()
(async
get_take_audio_url()
),
useful for passing to an audio playing running on a frontend browser.
1from daisys import DaisysAPI
2with DaisysAPI('speak', email='user@example.com', password='pw') as speak:
3 print("Streaming a take's audio.")
4 with speak.stream_take_audio(take_id='t01hasghx0zgdc29gpzexw5r8wc') as stream:
5 for chunk in stream:
6 print('Length in bytes:', len(chunk))
When using the HTTP method via endpoints outside of the Python library, please be aware of the use of 307 redirects and headers, outlined in Retrieving audio.
Websocket¶
See Daisys API websocket examples.
For lowest latency usage, it is additionally possible to use a websocket to create a connection directly to the worker node used for synthesizing audio. Requests are submitted to the worker and the same node streams back the audio as it is generated over the already-established connection.
1from daisys import DaisysAPI
2with DaisysAPI('speak', email='user@example.com', password='pw') as speak:
3 print("Streaming a take's audio.")
4 with speak.websocket(voice_id='v01hasgezqjcsnc91zdfzpx0apj') as ws:
5 request_id = ws.generate_take(voice_id='v01hasgezqjcsnc91zdfzpx0apj',
6 text="Hello, Daisys! It's a beautiful day.",
7 audio_callback=my_audio_cb,
8 status_callback=my_status_cb)
The specified callbacks will be called whenever the requested take’s status changes or audio data is generated. See Example: Websocket example, synchronous client for complete information on the signatures of these two callbacks and examples showing how they can be used to receive audio in chunks as it is generated.
In addition to the callback interface,
iter_request()
(async
iter_request()
)
is provided to allow an iterator-based for-loop (or async for-loop) over incoming
audio chunks, simplifying usage.
Finally, in applications where the backend should perform REST API calls but the
front-end should stream audio,
websocket_url()
can
be used to retrieve a URL that the front-end should connect a websocket to.
Example: Websocket example, web client is provided to show how to manage the websocket
connection using JavaScript.
Authentication with access tokens¶
All the above examples authenticate with the API using email and password. In some scenarios users will prefer to authenticate using only the access token. An access and refresh token can be retrieved once and used until it is manually revoked.
By default, when the client library is used with email and password, the refresh token is
automatically revoked when the client context is exited. When an access token is provided
to the client context, this automatic revocation is skipped, so that the token can be
refreshed on next usage. This can be controlled by setting speak.auto_logout
to
True
or False
.
To retrieve an access and refresh token for future use, the following program can thus be used:
1from daisys import DaisysAPI
2with DaisysAPI('speak', email='user@example.com', password='pw') as speak:
3 speak.auto_logout = False
4 speak.login()
5 access_token, refresh_token = speak.access_token, speak.refresh_token
These tokens can now be stored, and provided to the client as follows:
1from daisys import DaisysAPI
2
3def store_tokens(speak, access_token: str, refresh_token: str):
4 """Store the current Daisys access and refresh tokens."""
5 with open('daisys_tokens.json','w') as token_file:
6 json.dump([access_token, refresh_token], token_file)
7
8access_token, refresh_token = json.load(open('daisys_tokens.json'))
9with DaisysAPI('speak', access_token=access_token, refresh_token=refresh_token) as speak:
10 speak.token_callback = store_tokens
11 ...
The library does not implement a storage and retrieval mechanism for these tokens, as it is presumed that users will have their own files or databases for this purpose.
Importantly, when an access token expires, a new one will be automatically retrieved by
the library. Therefore, it is useful to store speak.access_token
and
speak.refresh_token
whenever it changes. The token_callback
is provided for
this purpose. It is optional, but recommended if not using a permatoken and one wishes to
avoid transmitting passwords.