Daisys API documentation This is the documentation for the public wrapper library for the Daisys API. The main product is “Speak”, which provides text-to-speech (TTS) services. Have your product talking in seconds! Building with an LLM? Give it this single-file text version of these docs. Example from daisys import DaisysAPI from daisys.v1.speak import SimpleProsody with DaisysAPI('speak', email='user@example.com', password='pw') as speak: voice = await speak.get_voices()[-1] print(f"{voice.name} speaking!") take = speak.generate_take(voice_id=voice.voice_id, text="Hello there, I am Daisys!", prosody=SimpleProsody(pace=-3, pitch=2, expression=10)) audio_wav = speak.get_take_audio(take.take_id, filename='hello_daisys.wav') Getting started Once you confirm your email address, you will be provided access to the Daisys API via the email account you registered. You have also already provided a password. At this point the API can be used. The steps are: 1. Authentication: provide your email and password to get an access token. 2. List the models that are available to you. 3. Create a voice for a model. 4. Create a “take” with that voice. (Request some speech.) 5. Download the audio for that take. All these steps are taken care of by the Python client library. Therefore, first we show briefly how to use the library, and secondly we show how to accomplish the above steps manually from the command line using curl. This can be useful if you want to build your own client in your preferred language. Depending on your needs, you may prefer not to provide the email and password every time the API is used. It is also possible refresh the access token using the provided refresh token, thereby continuing a previous session without logging out. For more on using the Python client library, see Daisys API examples. Getting started with the Python client library For more on using the Python client library, see Daisys API examples. In the following, the default “synchronous” client will be demonstrated. Some users will prefer to use asyncio, and in the following examples, with DaisysAPI() can be replaced with async with DaisysAPI, which returns an asynchronous client library that can be used with the await keyword. Installing the library The library is available on pypi.org and can be installed via pip. The Daisys API requires Python version 3.10 or greater. First create a Python venv, activate it, install the library, and then download and run the examples: Installing the library $ # setup a virtual environment $ mkdir daisys_project $ cd daisys_project $ python3 -m venv venv $ . venv/bin/activate $ $ # install the library $ python3 -m pip daisys $ $ # or if Python websocket support is needed $ python3 -m pip daisys[ws] Of course pip is only one option, you can use any Python project management software such as uv, Poetry, etc. Running an example Within the Python virtual environment, the hello_daisys.py example can be run. The examples are programmed to take your email and password in the environment variables as shown: Running an example $ curl -O https://raw.githubusercontent.com/daisys-ai/daisys-api-python/main/examples/hello_daisys.py $ export DAISYS_EMAIL=user@example.com $ export DAISYS_PASSWORD=example_password123 $ python3 hello_daisys.py Getting a client Getting a client by context manager from daisys import DaisysAPI with DaisysAPI('speak', email='user@example.com', password='pw') as speak: ... # or for asyncio support: async with DaisysAPI('speak', email='user@example.com', password='pw') as speak: ... As mentioned, an asyncio-enabled client can be instantiated by using async with in the line above. Additionally, the context manager interface (with) is optional; it is also possible to create a client by normal function call: Getting a client by function from daisys import DaisysAPI speak = DaisysAPI('speak', email=EMAIL, password=PASSWORD).get_client() # or.. speak = DaisysAPI('speak', email=EMAIL, password=PASSWORD).get_async_client() The main difference is that when an email and password are used, the context manager approach will automatically log out when the program exits the context, whereas when the client is retrieved by get_client or get_async_client, then .logout() function should be called. Logging out invalidates the refresh token so that no further sessions can be renewed without logging in again. Auto-logout will not occur when an access token is provided. The rest of this documentation will assume the normal, synchronous client. In all cases, functions should be called with await when used with the asyncio client. Listing the models Using the client library, it is easy to log into the API and start requesting text to speech services. The following Python code can be used to list the available models: Listing the models from daisys import DaisysAPI with DaisysAPI('speak', email='user@example.com', password='pw') as speak: print('Found models:') for model in speak.get_models(): print(model) Listing the voices You can use a model by using a voice associated with that model. Voices are identified by a voice_id field. Listing the voices from daisys import DaisysAPI with DaisysAPI('speak', email='user@example.com', password='pw') as speak: print('Found voices:') for voice in speak.get_voices(): print(f'{voice.name}, a {voice.gender} voice of {voice.model} with id {voice.voice_id}.') Generating a voice If you do not yet have any voices, you should generate one. Voices can be requested for a given gender and with default prosody information. Voices must be given names. For instance, the following block of code creates an expressive female voice for the shakespeare model: Generating a voice from daisys import DaisysAPI, VoiceGender from pprint import pprint with DaisysAPI('speak', email='user@example.com', password='pw') as speak: print('Creating a voice:') voice = speak.generate_voice(name="Deirdre", gender=VoiceGender.FEMALE, model="shakespeare") pprint(voice.model_dump()) Note that voice generation can take a few seconds! In this example, the speak.generate_voice command waits for the operation to finish, and therefore we can print the result immediately. It is also possible to adopt a more asynchronous style by providing wait=False to speak.generate_voice(). Alternatively, as mentioned above you can use the asyncio client to allow the await speak.generate_voice() syntax. The above code gives the following details: Generating a voice: output Creating a voice: {'default_style': [], 'default_prosody': None, 'done_webhook': None, 'example_take': None, 'example_take_id': 't01hasgezqkx4vth62xckymk3x3', 'gender': , 'model': 'shakespeare', 'name': 'Deirdre', 'status': , 'timestamp_ms': 1695218371261, 'voice_id': 'v01hasgezqjcsnc91zdfzpx0apj'} We can see that the voice has a female gender, and has an example take associated with it. This take_id can already be used to hear the voice. Generating a take Now that you have a voice, text to speech can be requested by the speak.take_generate() command: Generating a take from daisys import DaisysAPI from pprint import pprint with DaisysAPI('speak', email='user@example.com', password='pw') as speak: print('Creating a take:') take = speak.generate_take(voice_id='v01hasgezqjcsnc91zdfzpx0apj', text="Hello, Daisys! It's a beautiful day.") pprint(take.model_dump()) Giving, Generating a take: output Creating a take: {'done_webhook': None, 'info': {'audio_rate': 44100, 'duration': 152576, 'normalized_text': ['Hello, Daisys!', "It's a beautiful day."]}, 'override_language': None, 'prosody': None, 'status': , 'status_webhook': None, 'style': None, 'take_id': 't01hasgn2dnyg6jqrcym9cgxv75', 'text': "Hello, Daisys! It's a beautiful day.", 'timestamp_ms': 1695220926901, 'voice_id': 'v01hasgezqjcsnc91zdfzpx0apj'} Note that the status is “ready”, meaning that audio can now be retrieved. As with voice generation, an asynchronous approach is also available for generate_take. Retrieving a take’s audio The take is ready, now we can hear the result! Audio for a take can be retrieved as follows: Retrieving audio (1) from daisys import DaisysAPI with DaisysAPI('speak', email='user@example.com', password='pw') as speak: print("Getting a take's audio.") audio_wav = speak.get_take_audio(take_id='t01hasghx0zgdc29gpzexw5r8wc', file='beautiful_day.wav') print('Length in bytes:', len(audio_wav)) In the above code, we retrive a .wav file, which is (optiionally) written to a file in addition to being returned. This can be decoded for example using scipy’s io.wavfile module: Retrieving audio (2) from scipy.io import wavfile from io import BytesIO print(wavfile.read(BytesIO(audio_wav))) # Note: Since decoding the audio is outside the scope of the client library, # `scipy` is not a dependency and will not be automatically installed by `pip`. which, along with the previous code block, prints: Retrieving audio: output Getting a take's audio. Length in bytes: 292908 (44100, array([-111, -46, -104, ..., -128, -95, -9], dtype=int16)) The resulting file beautiful_day.wav can be played using command line programs like aplay on Linux, or any audio player such as the excellent VLC. You can integrate the results into your creative projects! It is also possible to retrieve the audio in other formats: mp3, flac, and m4a by providing the format parameter. Streaming audio The Daisys API supports two methods of streaming audio: * HTTP * Websocket HTTP The HTTP method downloads the audio file in chunks using a streaming response, and can be convenient if a simple iterator interface is desired. When making the take request, set wait to False, and call stream_take_audio() (async stream_take_audio()). Alternatively a signed URL can be retrieved using get_take_audio_url() (async get_take_audio_url()), useful for passing to an audio playing running on a frontend browser. Streaming audio, HTTP method from daisys import DaisysAPI with DaisysAPI('speak', email='user@example.com', password='pw') as speak: print("Streaming a take's audio.") with speak.stream_take_audio(take_id='t01hasghx0zgdc29gpzexw5r8wc') as stream: for chunk in stream: print('Length in bytes:', len(chunk)) When using the HTTP method via endpoints outside of the Python library, please be aware of the use of 307 redirects and headers, outlined in Retrieving audio. Websocket See Daisys API websocket examples. For lowest latency usage, it is additionally possible to use a websocket to create a connection directly to the worker node used for synthesizing audio. Requests are submitted to the worker and the same node streams back the audio as it is generated over the already-established connection. Streaming audio, websocket method from daisys import DaisysAPI with DaisysAPI('speak', email='user@example.com', password='pw') as speak: print("Streaming a take's audio.") with speak.websocket(voice_id='v01hasgezqjcsnc91zdfzpx0apj') as ws: request_id = ws.generate_take(voice_id='v01hasgezqjcsnc91zdfzpx0apj', text="Hello, Daisys! It's a beautiful day.", audio_callback=my_audio_cb, status_callback=my_status_cb) The specified callbacks will be called whenever the requested take’s status changes or audio data is generated. See Example: Websocket example, synchronous client for complete information on the signatures of these two callbacks and examples showing how they can be used to receive audio in chunks as it is generated. In addition to the callback interface, iter_request() (async iter_request()) is provided to allow an iterator-based for-loop (or async for-loop) over incoming audio chunks, simplifying usage. Finally, in applications where the backend should perform REST API calls but the front-end should stream audio, websocket_url() can be used to retrieve a URL that the front-end should connect a websocket to. Example: Websocket example, web client is provided to show how to manage the websocket connection using JavaScript. Authentication with access tokens All the above examples authenticate with the API using email and password. In some scenarios users will prefer to authenticate using only the access token. An access and refresh token can be retrieved once and used until it is manually revoked. By default, when the client library is used with email and password, the refresh token is automatically revoked when the client context is exited. When an access token is provided to the client context, this automatic revocation is skipped, so that the token can be refreshed on next usage. This can be controlled by setting speak.auto_logout to True or False. To retrieve an access and refresh token for future use, the following program can thus be used: Retrieving an access and refresh token from daisys import DaisysAPI with DaisysAPI('speak', email='user@example.com', password='pw') as speak: speak.auto_logout = False speak.login() access_token, refresh_token = speak.access_token, speak.refresh_token These tokens can now be stored, and provided to the client as follows: Retrieving an access and refresh token from daisys import DaisysAPI def store_tokens(speak, access_token: str, refresh_token: str): """Store the current Daisys access and refresh tokens.""" with open('daisys_tokens.json','w') as token_file: json.dump([access_token, refresh_token], token_file) access_token, refresh_token = json.load(open('daisys_tokens.json')) with DaisysAPI('speak', access_token=access_token, refresh_token=refresh_token) as speak: speak.token_callback = store_tokens ... The library does not implement a storage and retrieval mechanism for these tokens, as it is presumed that users will have their own files or databases for this purpose. Importantly, when an access token expires, a new one will be automatically retrieved by the library. Therefore, it is useful to store speak.access_token and speak.refresh_token whenever it changes. The token_callback is provided for this purpose. It is optional, but recommended if not using a permatoken and one wishes to avoid transmitting passwords. Getting started with the command line The Daisys API can be used from the command line using curl and jq. Most application writers will want to use this guide to see how to make HTTP calls to the API for developing their own client libraries in their favorite language. Running the curl example The Python client library source code bundles an example of how to use the API this way. Instructions to run that example are provided on the linked page. The rest of this document shall describe how to use the API one step at a time in a shell, rather than in a shell script. In the examples, the result of curl is piped to jq . for formatting purposes. Authenticating To access the Daisys Speak API, you must attach an access token to any HTTP calls, with the exception of the /version endpoint. To get such an access key, it can be requested by providing an email and password as follows: Authenticating: Getting an access token TOKENS=$(curl -s -X POST -H 'content-type: application/json' \ -d '{"email": "user@example.com", "password": "my_password123"}' \ https://api.daisys.ai/auth/login) export ACCESS_TOKEN=$(echo $TOKENS | jq -r .access_token) export REFRESH_TOKEN=$(echo $TOKENS | jq -r .refresh_token) You can keep using this access token for a limited time. It can be used by adding it into the string Bearer $ACCESS_TOKEN for the value of the Authorization header. If you receive a 401 response from any API request, the access token needs to be refreshed by issuing: Authenticating: Refreshing the access token $ TOKENS=$(curl -s -X POST -H 'content-type: application/json' \ -H "Authorization: Bearer $ACCESS_TOKEN" \ -d '{"refresh_token": "'$REFRESH_TOKEN'"}' \ https://api.daisys.ai/auth/refresh) $ export ACCESS_TOKEN=$(echo $TOKENS | jq -r .access_token) $ export REFRESH_TOKEN=$(echo $TOKENS | jq -r .refresh_token) Listing the models Models can be listed by accessing the /models endpoint. More information on the options are found in Model-related Endpoints. Listing the models $ curl -s -X GET -H "Authorization: Bearer $ACCESS_TOKEN" https://api.daisys.ai/v1/speak/models | jq . [ { "name": "shakespeare", "displayname": "Shakespeare", "flags": [], "languages": [ "en-GB" ], "genders": [ "female", "male" ], "styles": [ [ "base", "character", "narrator" ] ], "prosody_types": [ "simple", "affect" ] } ] Listing the voices Voices can be listed by accessing the /voices endpoint. More information on the options are found in Voice-related Endpoints. Listing the voices $ curl -s -X GET -H "Authorization: Bearer $ACCESS_TOKEN" https://api.daisys.ai/v1/speak/voices | jq . [ { "name": "Deirdre", "model": "shakespeare", "gender": "female", "default_style": [], "default_prosody": null, "example_take": null, "status_webhook": null, "done_webhook": null, "voice_id": "v01hasgezqjcsnc91zdfzpx0apj", "status": "ready", "timestamp_ms": 1695220727538, "example_take_id": "t01hasgezqkx4vth62xckymk3x3" } ] Generating a voice If you do not yet have any voices, you should generate one using the /voices/generate endpoint. Voices can be requested for a given gender and with default prosody information. Voices must be given names. More information on the options are found in Voice-related Endpoints. For instance, the following command creates an expressive female voice for the shakespeare model: Generating a voice $ curl -s -X POST -H 'content-type: application/json' \ -H "Authorization: Bearer $ACCESS_TOKEN" \ -d '{"name": "Ignacio", "gender": "male", "model": "shakespeare"}' \ https://api.daisys.ai/v1/speak/voices/generate | jq . { "name": "Ignacio", "model": "shakespeare-pause_symbol-18-4-23", "gender": "male", "default_style": null, "default_prosody": null, "example_take": null, "done_webhook": null, "voice_id": "v01haxx5cggwz215gzv0hjbra9m", "status": "waiting", "timestamp_ms": 1695368262160, "example_take_id": "t01haxx5cgg3n8f2qzc8zkbn97y" } Note that voice generation can take a few seconds! In this example, the “status” is “waiting” and not yet “ready”, therefore we should check in on it again after a second or two. For this, we need to use the voice_id provided in the response: Checking the voice status $ curl -s -X GET -H 'content-type: application/json' \ -H "Authorization: Bearer $ACCESS_TOKEN" \ https://api.daisys.ai/v1/speak/voices/v01haxx5cggwz215gzv0hjbra9m | jq . { "name": "Ignacio", "model": "shakespeare-pause_symbol-18-4-23", "gender": "male", "default_style": null, "default_prosody": null, "example_take": null, "done_webhook": null, "voice_id": "v01haxx5cggwz215gzv0hjbra9m", "status": "ready", "timestamp_ms": 1695368262160, "example_take_id": "t01haxx5cgg3n8f2qzc8zkbn97y" } The voice is now “ready”! We can now get its example audio using the example_take_id field, see Retrieving a take’s audio below. Note: as seen in the response structure, a webhook can also be provided to get a notification when the result is ready. This webhook is called as a POST request with the same response structure as seen here, provided in the request body. Generating a take Now that you have a voice, text to speech can be requested by the /takes/generate endpoint. Here we generate one with default prosody for the voice, which we also left as default (neutral) when generating the voice above. More information on the options are found in Take-related Endpoints. Generating a take $ curl -s -X POST -H 'content-type: application/json' \ -H "Authorization: Bearer $ACCESS_TOKEN" \ -d '{"text": "Hello, Daisys! It'\''s a beautiful day.", "voice_id": "v01hasgezqjcsnc91zdfzpx0apj"}' \ https://api.daisys.ai/v1/speak/takes/generate { "text": "Hello, Daisys! It's a beautiful day.", "override_language": null, "style": null, "prosody": null, "status_webhook": null, "done_webhook": null, "voice_id": "v01hasgezqjcsnc91zdfzpx0apj", "take_id": "t01haybgb16dn9dk0p5je47qz74", "status": "waiting", "timestamp_ms": 1695383301158, "info": null } Similar to with voice generation, take generation takes a couple of seconds, and the status can be retrieved by using the take_id: Generating a take: checking status $ curl -s -X GET -H "Authorization: Bearer $ACCESS_TOKEN" \ https://api.daisys.ai/v1/speak/takes/t01haybgb16dn9dk0p5je47qz74 | jq . { "text": "Hello, Daisys! It's a beautiful day.", "override_language": null, "style": null, "prosody": null, "status_webhook": null, "done_webhook": null, "voice_id": "v01hasgezqjcsnc91zdfzpx0apj", "take_id": "t01haybgb16dn9dk0p5je47qz74", "status": "ready", "timestamp_ms": 1695383301158, "info": { "duration": 150528, "audio_rate": 44100, "normalized_text": [ "Hello, Daisys!", "It's a beautiful day." ] } } Similar to voice generation, it is possible to use a webhook for the “done” notification. For longer texts, it is also possible to request a “status” webhook which may be called several times whenever the progress for a take changes. Here, we see the status is “ready”, meaning that audio can now be retrieved. Retrieving a take’s audio The take is ready, now we can hear the result! Audio for a take can be retrieved as follows: Retrieving audio $ curl -s -L -X GET -H "Authorization: Bearer $ACCESS_TOKEN" \ -o beautiful_day.wav \ https://api.daisys.ai/v1/speak/takes/t01haybgb16dn9dk0p5je47qz74/wav In the above, we retrieve a .wav file and write it to disk as beautiful_day.wav. Note that the -L flag must be provided since the file is returned through a 307 redirect. The resulting file beautiful_day.wav can be played using command line programs like aplay on Linux, or any audio player such as the excellent VLC. You can integrate the results into your creative projects! It is also possible to retrieve the audio in other formats: mp3, flac, webm, and m4a, by retrieving at the corresponding URL, ../speak/takes/t01haybgb16dn9dk0p5je47qz74/mp3, etc. Daisys API examples The following examples can be used to see how the Daisys API client library for Python can be used. Running examples export DAISYS_EMAIL=user@example.com DAISYS_PASSWORD='' python3 -m venv venv . venv/bin/activate python3 -m pip install daisys python3 -m daisys.examples.hello_daisys Example: Hello Daisys, synchronous client This example shows: 1. How to create the synchronous client using a context manager. 2. Get a list of voices. 3. If there are none, how to generate a voice. 4. Reference the voice to generate audio (a “take”) for some text. 5. Download the resulting audio. Example output $ python3 -m examples.hello_daisys Found Daisys Speak API version=1 minor=0 Found voices: [] Not enough voices! Using model "shakespeare" Generating a voice. Sally speaking! Read 198700 bytes of wav data, wrote "hello_daisys.wav". Checking take: True Checking list of takes: True Deleting take t01hbbgw0zz4e9y6pb9qdxnrmag: True Deleting voice v01hbbgtrvk50pxwyjvvsxygbza: True examples/hello_daisys.py import os, asyncio, json from daisys import DaisysAPI from daisys.v1.speak import VoiceGender, SimpleProsody, DaisysTakeGenerateError, HTTPStatusError # Override DAISYS_EMAIL and DAISYS_PASSWORD with your details! EMAIL = os.environ.get('DAISYS_EMAIL', 'user@example.com') PASSWORD = os.environ.get('DAISYS_PASSWORD', 'pw') # Please see tokens_example.py for how to use an access token instead of a password. def load_tokens(): """A function to access and refresh tokens from a local file. In practice you might store this somewhere more global like in a database, to re-use between sessions.""" try: with open('daisys_tokens.json') as tokens_file: tokens = json.load(tokens_file) print('Loaded tokens from "daisys_tokens.json".') return tokens['access_token'], tokens['refresh_token'] except (FileNotFoundError, json.JSONDecodeError): return None, None ACCESS_TOKEN, REFRESH_TOKEN = load_tokens() def main(): with DaisysAPI('speak', email=EMAIL, password=PASSWORD) as speak: print('Found Daisys Speak API', speak.version()) # The following is an example of how to use the Daisys API for generating a voice # and then using it in a speech generation task. The API generates "takes" # representing one or more sentences from a speaker. The same example is possible # with the synchronous client, where the 'await' keywords should be removed. # Get a list of all voices voices = speak.get_voices() print('Found voices:', [voice.name for voice in voices]) # Choose one if len(voices) > 0: voice = voices[-1] delete_voice = False else: print('Not enough voices!') # Okay, let's generate a voice. # First we need to know the model. models = speak.get_models() if len(models) > 0: model = models[0] print(f'Using model "{model.displayname}"') else: print('No models found!') return print('Generating a voice.') voice = speak.generate_voice(name='Lucy', gender=VoiceGender.FEMALE, model=model.name) delete_voice = True # Try to modify the voice's name voice.name = 'Sally' speak.update_voice(**dict(voice)) voice = speak.get_voice(voice.voice_id) # Now we have a voice. print(voice.name, 'speaking!') try: # Synthesize some audio from text take = speak.generate_take(voice_id=voice.voice_id, text="Hello there, I am Daisys!", prosody=SimpleProsody(pace=-3, pitch=2, expression=10)) except DaisysTakeGenerateError as e: print('Error generating take:', str(e)) return # The take is now READY. We get its associated audio file. We provide a filename # so that it gets written to disk, but it is also returned. audio_wav = speak.get_take_audio(take.take_id, file='hello_daisys.mp3', format='mp3') print(f'Read {len(audio_wav)} bytes of wav data, wrote "hello_daisys.wav".') # Let's check if we can get info on it again. check_take = speak.get_take(take.take_id) print('Checking take:', check_take == take) # Let's check if we can find it in the most recent 5 takes. last_5_takes = speak.get_takes(length=5) print('Checking list of takes:', take.take_id in [t.take_id for t in last_5_takes]) # Delete the take print(f'Deleting take {take.take_id}:', speak.delete_take(take.take_id)) # Delete the voice if delete_voice: print(f'Deleting voice {voice.voice_id}:', speak.delete_voice(voice.voice_id)) if __name__=='__main__': try: main() except HTTPStatusError as e: try: print(f'HTTP error status {e.response.status_code}: {e.response.json()["detail"]}, {e.request.url}') except: print(f'HTTP error status {e.response.status_code}: {e.response.text}, {e.request.url}') Example: Hello Daisys, asynchronous client This example shows: 1. How to create the asyncio client using a context manager. 2. Get a list of voices. 3. If there are none, how to generate a voice. 4. Reference the voice to generate audio (a “take”) for some text. 5. Download the resulting audio. 6. Get the take information by identifier or as a filtered list. To run it, you must replace the username and password with your credentials. Example output $ python3 -m examples.hello_daisys_async Found Daisys Speak API version=1 minor=0 Found voices: [] Not enough voices! Using model "shakespeare" Generating a voice. Sally speaking! Read 208940 bytes of wav data, wrote "hello_daisys.wav". Checking take: True Checking list of takes: True Deleting take t01hbbgyx2008ggp61pzh6jaemf: True Deleting voice v01hbbgyrpxxbcj6q37f1yd03gd: True examples/hello_daisys_async.py import os, asyncio from daisys import DaisysAPI from daisys.v1.speak import VoiceGender, SimpleProsody, DaisysTakeGenerateError, HTTPStatusError # Override DAISYS_EMAIL and DAISYS_PASSWORD with your details! EMAIL = os.environ.get('DAISYS_EMAIL', 'user@example.com') PASSWORD = os.environ.get('DAISYS_PASSWORD', 'pw') # Please see tokens_example.py for how to use an access token instead of a password. async def main(): async with DaisysAPI('speak', email=EMAIL, password=PASSWORD) as speak: print('Found Daisys Speak API', await speak.version()) # The following is an example of how to use the Daisys API for generating a voice # and then using it in a speech generation task. The API generates "takes" # representing one or more sentences from a speaker. The same example is possible # with the synchronous client, where the 'await' keywords should be removed. # Get a list of all voices voices = await speak.get_voices() print('Found voices:', [voice.name for voice in voices]) # Choose one if len(voices) > 0: voice = voices[-1] delete_voice = False else: print('Not enough voices!') # Okay, let's generate a voice. # First we need to know the model. models = await speak.get_models() if len(models) > 0: model = models[0] print(f'Using model "{model.displayname}"') else: print('No models found!') return print('Generating a voice.') voice = await speak.generate_voice(name='Lucy', gender=VoiceGender.FEMALE, model=model.name) delete_voice = True # Try to modify the voice's name voice.name = 'Sally' await speak.update_voice(**dict(voice)) voice = await speak.get_voice(voice.voice_id) # Now we have a voice. print(voice.name, 'speaking!') try: # Synthesize some audio from text take = await speak.generate_take(voice_id=voice.voice_id, text="Hello there, I am Daisys!", prosody=SimpleProsody(pace=-3, pitch=2, expression=10)) except DaisysTakeGenerateError as e: print('Error generating take:', str(e)) return # The take is now READY. We get its associated audio file. We provide a filename # so that it gets written to disk, but it is also returned. audio_wav = await speak.get_take_audio(take.take_id, file='hello_daisys.wav') print(f'Read {len(audio_wav)} bytes of wav data, wrote "hello_daisys.wav".') # Let's check if we can get info on it again. check_take = await speak.get_take(take.take_id) print('Checking take:', check_take == take) # Let's check if we can find it in the most recent 5 takes. last_5_takes = await speak.get_takes(length=5) print('Checking list of takes:', take.take_id in [t.take_id for t in last_5_takes]) # Delete the take print(f'Deleting take {take.take_id}:', await speak.delete_take(take.take_id)) # Delete the voice if delete_voice: print(f'Deleting voice {voice.voice_id}:', await speak.delete_voice(voice.voice_id)) if __name__=='__main__': try: asyncio.run(main()) except HTTPStatusError as e: try: print(f'HTTP error status {e.response.status_code}: {e.response.json()["detail"]}') except: print(f'HTTP error status {e.response.status_code}: {e.response.text}') Daisys API websocket examples In addition to retrieving audio by means of the REST API (see Retrieving audio), takes can also be requested and audio can be streamed using a websocket connection. Do note that if the /wav endpoint is accessed before a take is “ready”, the .wav file will be streamed while it is generated, so the complexities of the websocket connection may not be necessary, depending on your application. However, the websocket connection does provide the lowest latency since the connection is directly made to a worker node. On the other hand since a single shared connection is used, requests over websocket are essentially serialized. For batch-style jobs where throughput rather than latency is a concern, the REST API is encouraged, since it distributes the jobs over multiple workers. The websocket streaming is also higher complexity than using the REST API. A detailed definition of the websocket streaming protocol can be found at Daisys API websockets. Parts vs chunks There are two streaming modes: “parts” and “chunks”, each example by default shows “parts” mode but can be put in “chunks” mode by executing with an argument --chunks. The difference is: * For audio generation, an input paragraph or document is broken up into multiple parts that end in silence, usually corresponding with a sentence. * In “parts” streaming mode, the default, each part is sent in a separate message. Each part contains a wav header. The intention is that this can be parsed and played directly by an audio player, and each part can be sequenced one after the other. * In “chunks” streaming mode, the parts similarly are each composed of a wav file with a header, however the file is sent in small chunks as it is generated. This results in reduced latency, at the expense that the chunks must be combined on reception, either by feeding them into an audio stream or concatenating them into a final wav file. Only the first chunk contains the wav header, and the length it indicates corresponds to the full part. Furthermore each request results in a stream of both text and binary messages. The former contain the entire contents of the take’s TakeResponse structure, and is transmitted whenever the status field changes. The latter contains the audio parts or chunks. Callbacks vs iterator for both the “parts” and “chunks” mode, the Python API provides either a callback-based mechanism for receiving status and audio messages, as well as a wrapper that provides an iterator-style interface called iter_request() to the same information which tends to simplify client code, see the corresponding examples for synchronous and async clients. When callbacks or iterators are executed, order has already been reconstructed, so parts and chunks are delivered to the user code in the correct order. Due to the nature of websocket connections, the order of incoming messages is not guaranteed. This is why the part_id and chunk_id values are included in all messages (chunk_id only if “chunks” streaming option is specified), so that the correct order can be reconstructed in the receiving client. This is also taken care of by the Python library, and is demonstrated for JavaScript in the websocket_client web app example. Fetching the websocket URL directly As mentioned, the last example websocket_client shows how to integrate websockets into a web app, and therefore in this case the stream ingestion is performed by JavaScript. Therefore the Python library is only used to retrieve the websocket URL using websocket_url() (which includes a lifetime-limited secret that is distinct from your access token for security) and the same work described above of connecting to the websocket, sending requests, and iterating over the ordered incoming status and audio messages is performed by the included JavaScript code. Example: Websocket example, synchronous client This example shows: 1. How to open a websocket connection using a context manager. 2. Generate a take, specifying status and audio callbacks. 3. The signature of each of these callbacks and how to interpret their arguments. 4. How to make a request with and without chunks enabled. (Add argument --chunks.) Example output $ python3 -m examples.websocket_example Found Daisys Speak API version=1 minor=0 Status.WAITING Status.STARTED [0.739s] Received part_id=0 (chunk_id=None) for take_id='t01jqrjc9bfx8z0w6zarf8hcq8y' with audio length 235564 Read 235564 bytes of wav data, wrote "websocket_part1.wav". Status.PROGRESS_50 [1.166s] Received part_id=1 (chunk_id=None) for take_id='t01jqrjc9bfx8z0w6zarf8hcq8y' with audio length 106540 Read 106540 bytes of wav data, wrote "websocket_part2.wav". [1.166s] Received part_id=2 (chunk_id=None) for take_id='t01jqrjc9bfx8z0w6zarf8hcq8y' with audio length (empty -- done receiving) Status.READY Deleting take t01jqrjc9bfx8z0w6zarf8hcq8y: True Example output (chunks enabled) $ python3 -m examples.websocket_example --chunks Found Daisys Speak API version=1 minor=0 Status.WAITING Status.STARTED [0.311s] Received part_id=0 (chunk_id=0) for take_id='t01jqrjq6xs1vbkdm1493e60gv2' with audio length 4140 [0.323s] Received part_id=0 (chunk_id=1) for take_id='t01jqrjq6xs1vbkdm1493e60gv2' with audio length 4096 [0.335s] Received part_id=0 (chunk_id=2) for take_id='t01jqrjq6xs1vbkdm1493e60gv2' with audio length 4096 [0.347s] Received part_id=0 (chunk_id=3) for take_id='t01jqrjq6xs1vbkdm1493e60gv2' with audio length 4096 [0.359s] Received part_id=0 (chunk_id=4) for take_id='t01jqrjq6xs1vbkdm1493e60gv2' with audio length 4096 [0.372s] Received part_id=0 (chunk_id=5) for take_id='t01jqrjq6xs1vbkdm1493e60gv2' with audio length 4096 [0.385s] Received part_id=0 (chunk_id=6) for take_id='t01jqrjq6xs1vbkdm1493e60gv2' with audio length 4096 [0.396s] Received part_id=0 (chunk_id=7) for take_id='t01jqrjq6xs1vbkdm1493e60gv2' with audio length 4096 [0.409s] Received part_id=0 (chunk_id=8) for take_id='t01jqrjq6xs1vbkdm1493e60gv2' with audio length 4096 [0.421s] Received part_id=0 (chunk_id=9) for take_id='t01jqrjq6xs1vbkdm1493e60gv2' with audio length 4096 [0.433s] Received part_id=0 (chunk_id=10) for take_id='t01jqrjq6xs1vbkdm1493e60gv2' with audio length 4096 [0.445s] Received part_id=0 (chunk_id=11) for take_id='t01jqrjq6xs1vbkdm1493e60gv2' with audio length 4096 [0.457s] Received part_id=0 (chunk_id=12) for take_id='t01jqrjq6xs1vbkdm1493e60gv2' with audio length 4096 [0.469s] Received part_id=0 (chunk_id=13) for take_id='t01jqrjq6xs1vbkdm1493e60gv2' with audio length 4096 [0.479s] Received part_id=0 (chunk_id=14) for take_id='t01jqrjq6xs1vbkdm1493e60gv2' with audio length 4096 [0.489s] Received part_id=0 (chunk_id=15) for take_id='t01jqrjq6xs1vbkdm1493e60gv2' with audio length 4096 [0.500s] Received part_id=0 (chunk_id=16) for take_id='t01jqrjq6xs1vbkdm1493e60gv2' with audio length 4096 [0.510s] Received part_id=0 (chunk_id=17) for take_id='t01jqrjq6xs1vbkdm1493e60gv2' with audio length 4096 [0.520s] Received part_id=0 (chunk_id=18) for take_id='t01jqrjq6xs1vbkdm1493e60gv2' with audio length 4096 [0.530s] Received part_id=0 (chunk_id=19) for take_id='t01jqrjq6xs1vbkdm1493e60gv2' with audio length 4096 [0.540s] Received part_id=0 (chunk_id=20) for take_id='t01jqrjq6xs1vbkdm1493e60gv2' with audio length 4096 [0.550s] Received part_id=0 (chunk_id=21) for take_id='t01jqrjq6xs1vbkdm1493e60gv2' with audio length 4096 [0.560s] Received part_id=0 (chunk_id=22) for take_id='t01jqrjq6xs1vbkdm1493e60gv2' with audio length 4096 [0.570s] Received part_id=0 (chunk_id=23) for take_id='t01jqrjq6xs1vbkdm1493e60gv2' with audio length 4096 [0.580s] Received part_id=0 (chunk_id=24) for take_id='t01jqrjq6xs1vbkdm1493e60gv2' with audio length 4096 [0.590s] Received part_id=0 (chunk_id=25) for take_id='t01jqrjq6xs1vbkdm1493e60gv2' with audio length 4096 [0.600s] Received part_id=0 (chunk_id=26) for take_id='t01jqrjq6xs1vbkdm1493e60gv2' with audio length 4096 [0.610s] Received part_id=0 (chunk_id=27) for take_id='t01jqrjq6xs1vbkdm1493e60gv2' with audio length 4096 [0.621s] Received part_id=0 (chunk_id=28) for take_id='t01jqrjq6xs1vbkdm1493e60gv2' with audio length 4096 [0.631s] Received part_id=0 (chunk_id=29) for take_id='t01jqrjq6xs1vbkdm1493e60gv2' with audio length 2560 [0.631s] Received part_id=0 (chunk_id=30) for take_id='t01jqrjq6xs1vbkdm1493e60gv2' with audio length (empty -- done receiving) Read 121388 bytes of wav data, wrote "websocket_part1.wav". Status.PROGRESS_50 [0.979s] Received part_id=1 (chunk_id=0) for take_id='t01jqrjq6xs1vbkdm1493e60gv2' with audio length 4140 [0.989s] Received part_id=1 (chunk_id=1) for take_id='t01jqrjq6xs1vbkdm1493e60gv2' with audio length 4096 [0.998s] Received part_id=1 (chunk_id=2) for take_id='t01jqrjq6xs1vbkdm1493e60gv2' with audio length 4096 [1.007s] Received part_id=1 (chunk_id=3) for take_id='t01jqrjq6xs1vbkdm1493e60gv2' with audio length 4096 [1.018s] Received part_id=1 (chunk_id=4) for take_id='t01jqrjq6xs1vbkdm1493e60gv2' with audio length 4096 [1.028s] Received part_id=1 (chunk_id=5) for take_id='t01jqrjq6xs1vbkdm1493e60gv2' with audio length 4096 [1.038s] Received part_id=1 (chunk_id=6) for take_id='t01jqrjq6xs1vbkdm1493e60gv2' with audio length 4096 [1.048s] Received part_id=1 (chunk_id=7) for take_id='t01jqrjq6xs1vbkdm1493e60gv2' with audio length 4096 [1.058s] Received part_id=1 (chunk_id=8) for take_id='t01jqrjq6xs1vbkdm1493e60gv2' with audio length 4096 [1.069s] Received part_id=1 (chunk_id=9) for take_id='t01jqrjq6xs1vbkdm1493e60gv2' with audio length 4096 [1.079s] Received part_id=1 (chunk_id=10) for take_id='t01jqrjq6xs1vbkdm1493e60gv2' with audio length 4096 [1.089s] Received part_id=1 (chunk_id=11) for take_id='t01jqrjq6xs1vbkdm1493e60gv2' with audio length 4096 [1.100s] Received part_id=1 (chunk_id=12) for take_id='t01jqrjq6xs1vbkdm1493e60gv2' with audio length 4096 [1.109s] Received part_id=1 (chunk_id=13) for take_id='t01jqrjq6xs1vbkdm1493e60gv2' with audio length 2048 [1.110s] Received part_id=1 (chunk_id=14) for take_id='t01jqrjq6xs1vbkdm1493e60gv2' with audio length (empty -- done receiving) Read 55340 bytes of wav data, wrote "websocket_part2.wav". [1.110s] Received part_id=2 (chunk_id=0) for take_id='t01jqrjq6xs1vbkdm1493e60gv2' with audio length (empty -- done receiving) Status.READY Deleting take t01jqrjq6xs1vbkdm1493e60gv2: True examples/websocket_example.py import sys, os, time from typing import Optional from daisys import DaisysAPI from daisys.v1.speak import (DaisysWebsocketGenerateError, HTTPStatusError, Status, TakeResponse, StreamOptions, StreamMode) # Override DAISYS_EMAIL and DAISYS_PASSWORD with your details! EMAIL = os.environ.get('DAISYS_EMAIL', 'user@example.com') PASSWORD = os.environ.get('DAISYS_PASSWORD', 'pw') # Please see tokens_example.py for how to use an access token instead of a password. def main(chunks): with DaisysAPI('speak', email=EMAIL, password=PASSWORD) as speak: print('Found Daisys Speak API', speak.version()) # A buffer to receive parts; we initialize with a single empty bytes() # because we will use it to accumulate chunks of the current wav file # there. In total we will end with a list of wav files, one for each # part. Parts are bits of speech, usually full sentences, that end with # silence. audio_wavs = [bytes()] # Assume at least one voice is available voice = speak.get_voices()[0] with speak.websocket(voice_id=voice.voice_id) as ws: # Flags we can use to only wait on our one take request; we wait # until the take is READY, and we also wait until we are done # receive all audio parts. done = False ready = False # Time the latency from when we submit the request until each part # is received. t0 = time.time() # The audio callback receives "parts" consisting of audio .wav files # with WAV headers on each part. Depending on the stream settings, # the file may be divided into chunks, where chunk_id==None indicates # the last chunk of a part. If audio==None, then no more parts will # arrive for that take_id. def audio_cb(request_id: int, take_id: str, part_id: int, chunk_id: Optional[int], audio: Optional[bytes]): nonlocal done # Report timing info and function arguments print(f'[{time.time()-t0:0.3f}s] Received {part_id=} ({chunk_id=}) for {take_id=} ' 'with audio length', len(audio) if audio else '(empty -- done receiving)') # We only requested one take_id; the take_id is generated by the # Daisys API, so we do not know it until the first status # message arrives. Therefore we can check that the request_id # is the expected one. assert request_id == generate_request_id assert generated_take is None or take_id == generated_take.take_id if audio is None: # If stream is done for this part if chunk_id in [0, None]: # If we have any audio data, write out the last file if len(audio_wavs[-1]) > 0: with open(f'websocket_part{len(audio_wavs)}.wav', 'wb') as f: f.write(audio_wavs[-1]) print(f'Read {len(audio_wavs[-1])} bytes of wav data, wrote "{f.name}".') # Flag that we are done receiving audio done = True # If we are receiving the last chunk of a part elif chunk_id > 0: # Write out the part with open(f'websocket_part{len(audio_wavs)}.wav', 'wb') as f: f.write(audio_wavs[-1]) print(f'Read {len(audio_wavs[-1])} bytes of wav data, wrote "{f.name}".') # Start a new part audio_wavs.append(bytes()) # Otherwise append the chunk. else: audio_wavs[-1] = audio_wavs[-1] + audio # If non-chunked stream, the part is ended immediately if chunk_id is None: # If we have any audio data, write out the file with open(f'websocket_part{len(audio_wavs)}.wav', 'wb') as f: f.write(audio_wavs[-1]) print(f'Read {len(audio_wavs[-1])} bytes of wav data, wrote "{f.name}".') # Start a new part audio_wavs.append(bytes()) # The status callback is called every time the take's status # changes. Here we use it to end the update loop. def status_cb(request_id: int, take: TakeResponse): nonlocal ready, generated_take assert request_id == generate_request_id generated_take = take print(take.status) if take.status == Status.READY: ready = True # Submit a request to generate a take over the websocket connection. generate_request_id = ws.generate_take( voice_id=voice.voice_id, text='Hello from Daisys websockets! How may I help you?', status_callback=status_cb, audio_callback=audio_cb, # Optional stream_options=StreamOptions(mode=StreamMode.CHUNKS) if chunks else None, ) # Will be filled in by callbacks. On submitting the generate # request, we do not yet know what take_id will be assigned so we # must discover it by means of the status callback. generated_take = None # We loop on the websocket while waiting 5 seconds between updates, # and end when the take as been set to READY and all audio has been # received. This update waits 1 second by default, here we set to 5 # seconds, but it can also wait forever by setting timeout to None # or be made a non-blocking operation by setting timeout to 0. # (Important: in async client, timeout=0 leads to TimeoutError, it # cannot be used for non-blocking operations with asyncio.) while not (ready and done) and (time.time() - t0) < 60: try: ws.update(timeout=5) except DaisysWebsocketGenerateError as e: # As opposed to other websocket errors, if a generate error # occurs it does not necessarily mean we want to close the # stream. print(e) # In this example, however, we actually do, because we only # requested a single take, so stop here. break # Delete the take if generated_take: print(f'Deleting take {generated_take.take_id}:', speak.delete_take(generated_take.take_id)) if __name__=='__main__': try: main('--chunks' in sys.argv[1:]) except HTTPStatusError as e: try: print(f'HTTP error status {e.response.status_code}: {e.response.json()["detail"]}, {e.request.url}') except: print(f'HTTP error status {e.response.status_code}: {e.response.text}, {e.request.url}') Example: Websocket example, synchronous client with iterator This example shows: 1. How to open a websocket connection using a context manager. 2. Generate a take. 3. How to iterate over the resulting status and audio messages using iter_request(). 4. How to make a request with and without chunks enabled. (Add argument --chunks.) Example output $ python3 -m examples.websocket_example_iter Found Daisys Speak API version=1 minor=0 [0.002] Take status was changed to: WAITING. [0.022] Take status was changed to: STARTED. [0.748] New part being received. [0.748] Received audio chunk of size 233472. [1.204] Take status was changed to: PROGRESS_50. [1.208] New part being received. [1.208] Received audio chunk of size 116736. [2.597] Take status was changed to: READY. Deleting take t01jqrj9j7hyx49enqya9qeas3t: True Example output (chunks enabled) $ python3 -m examples.websocket_example_iter --chunks Found Daisys Speak API version=1 minor=0 [0.002] Take status was changed to: WAITING. [0.026] Take status was changed to: STARTED. [0.314] New part being received. [0.314] Received audio chunk of size 4096. [0.328] Received audio chunk of size 4096. [0.341] Received audio chunk of size 4096. [0.351] Received audio chunk of size 4096. [0.361] Received audio chunk of size 4096. [0.371] Received audio chunk of size 4096. [0.381] Received audio chunk of size 4096. [0.391] Received audio chunk of size 4096. [0.401] Received audio chunk of size 4096. [0.411] Received audio chunk of size 4096. [0.421] Received audio chunk of size 4096. [0.431] Received audio chunk of size 4096. [0.442] Received audio chunk of size 4096. [0.452] Received audio chunk of size 4096. [0.462] Received audio chunk of size 4096. [0.472] Received audio chunk of size 4096. [0.482] Received audio chunk of size 4096. [0.492] Received audio chunk of size 4096. [0.502] Received audio chunk of size 4096. [0.512] Received audio chunk of size 4096. [0.521] Received audio chunk of size 4096. [0.532] Received audio chunk of size 4096. [0.542] Received audio chunk of size 4096. [0.551] Received audio chunk of size 4096. [0.561] Received audio chunk of size 4096. [0.572] Received audio chunk of size 4096. [0.582] Received audio chunk of size 4096. [0.592] Received audio chunk of size 4096. [0.603] Received audio chunk of size 4096. [0.613] Received audio chunk of size 1536. [0.963] Take status was changed to: PROGRESS_50. [0.966] New part being received. [0.966] Received audio chunk of size 4096. [0.976] Received audio chunk of size 4096. [0.985] Received audio chunk of size 4096. [0.994] Received audio chunk of size 4096. [1.004] Received audio chunk of size 4096. [1.014] Received audio chunk of size 4096. [1.024] Received audio chunk of size 4096. [1.034] Received audio chunk of size 4096. [1.044] Received audio chunk of size 4096. [1.055] Received audio chunk of size 4096. [1.065] Received audio chunk of size 4096. [1.075] Received audio chunk of size 4096. [1.085] Received audio chunk of size 4096. [1.095] Received audio chunk of size 3072. [2.600] Take status was changed to: READY. Deleting take t01jqrjxc257e1mr0r0z65ak4qb: True examples/websocket_example_async_iter.py import sys, os, asyncio, time from typing import Optional from daisys import DaisysAPI from daisys.v1.speak import (DaisysWebsocketGenerateError, HTTPStatusError, Status, TakeResponse, StreamOptions, StreamMode) # Override DAISYS_EMAIL and DAISYS_PASSWORD with your details! EMAIL = os.environ.get('DAISYS_EMAIL', 'user@example.com') PASSWORD = os.environ.get('DAISYS_PASSWORD', 'pw') # Please see tokens_example.py for how to use an access token instead of a password. async def main(chunks): async with DaisysAPI('speak', email=EMAIL, password=PASSWORD) as speak: print('Found Daisys Speak API', await speak.version()) # A buffer to receive parts; we initialize with a single empty bytes() # because we will use it to accumulate chunks of the current wav file # there. In total we will end with a list of wav files, one for each # part. Parts are bits of speech, usually full sentences, that end with # silence. audio_wavs = [bytes()] # Assume at least one voice is available voice = (await speak.get_voices())[0] async with speak.websocket(voice_id=voice.voice_id) as ws: # Time the latency from when we submit the request until each part # is received. t0 = time.time() # Submit a request to generate a take over the websocket connection. generate_request_id = await ws.generate_take( voice_id=voice.voice_id, text='Hello from Daisys websockets! How may I help you?', # Optional stream_options=StreamOptions(mode=StreamMode.CHUNKS) if chunks else None, ) # The use of an interator simplifies streaming, here we show how to # get both status and audio chunks from the same iterator. async for take_id, take, header, audio in ws.iter_request(generate_request_id): now = time.time() - t0 if take is not None: print(f'[{now:.03f}] Take status was changed to: {take.status.name}.') if header is not None: print(f'[{now:.03f}] New part being received.') if audio is not None: print(f'[{now:.03f}] Received audio chunk of size {len(audio)}.') # Delete the take if take_id: print(f'Deleting take {take_id}:', await speak.delete_take(take_id)) if __name__=='__main__': try: asyncio.run(main('--chunks' in sys.argv[1:])) except HTTPStatusError as e: try: print(f'HTTP error status {e.response.status_code}: {e.response.json()["detail"]}, {e.request.url}') except: print(f'HTTP error status {e.response.status_code}: {e.response.text}, {e.request.url}') Example: Websocket example, asynchronous client This example shows: 1. How to open a websocket connection using an async context manager. 2. Generate a take, specifying status and audio callbacks. 3. The signature of each of these callbacks and how to interpret their arguments. 4. How to make a request with and without chunks enabled. (Add argument --chunks.) Example output $ python3 -m examples.websocket_example_async Found Daisys Speak API version=1 minor=0 Status.WAITING Status.STARTED [0.751s] Received part_id=0 (chunk_id=None) for take_id='t01jqrjedpjb11hpg40h9kkydpk' with audio length 245804 appending audio 245804 Read 245804 bytes of wav data, wrote "websocket_part1.wav". Status.PROGRESS_50 [1.183s] Received part_id=1 (chunk_id=None) for take_id='t01jqrjedpjb11hpg40h9kkydpk' with audio length 112684 appending audio 112684 Read 112684 bytes of wav data, wrote "websocket_part2.wav". [1.184s] Received part_id=2 (chunk_id=None) for take_id='t01jqrjedpjb11hpg40h9kkydpk' with audio length (empty -- done receiving) stream done Status.READY Deleting take t01jqrjedpjb11hpg40h9kkydpk: True Example output (chunks enabled) $ python3 -m examples.websocket_example_async --chunks Found Daisys Speak API version=1 minor=0 Status.WAITING Status.STARTED [0.311s] Received part_id=0 (chunk_id=0) for take_id='t01jqrjh3yrbzpd79q1nprcrbjy' with audio length 4140 appending audio 4140 [0.324s] Received part_id=0 (chunk_id=1) for take_id='t01jqrjh3yrbzpd79q1nprcrbjy' with audio length 4096 appending audio 8236 [0.338s] Received part_id=0 (chunk_id=2) for take_id='t01jqrjh3yrbzpd79q1nprcrbjy' with audio length 4096 appending audio 12332 [0.351s] Received part_id=0 (chunk_id=3) for take_id='t01jqrjh3yrbzpd79q1nprcrbjy' with audio length 4096 appending audio 16428 [0.365s] Received part_id=0 (chunk_id=4) for take_id='t01jqrjh3yrbzpd79q1nprcrbjy' with audio length 4096 appending audio 20524 [0.378s] Received part_id=0 (chunk_id=5) for take_id='t01jqrjh3yrbzpd79q1nprcrbjy' with audio length 4096 appending audio 24620 [0.389s] Received part_id=0 (chunk_id=6) for take_id='t01jqrjh3yrbzpd79q1nprcrbjy' with audio length 4096 appending audio 28716 [0.399s] Received part_id=0 (chunk_id=7) for take_id='t01jqrjh3yrbzpd79q1nprcrbjy' with audio length 4096 appending audio 32812 [0.409s] Received part_id=0 (chunk_id=8) for take_id='t01jqrjh3yrbzpd79q1nprcrbjy' with audio length 4096 appending audio 36908 [0.419s] Received part_id=0 (chunk_id=9) for take_id='t01jqrjh3yrbzpd79q1nprcrbjy' with audio length 4096 appending audio 41004 [0.429s] Received part_id=0 (chunk_id=10) for take_id='t01jqrjh3yrbzpd79q1nprcrbjy' with audio length 4096 appending audio 45100 [0.439s] Received part_id=0 (chunk_id=11) for take_id='t01jqrjh3yrbzpd79q1nprcrbjy' with audio length 4096 appending audio 49196 [0.449s] Received part_id=0 (chunk_id=12) for take_id='t01jqrjh3yrbzpd79q1nprcrbjy' with audio length 4096 appending audio 53292 [0.459s] Received part_id=0 (chunk_id=13) for take_id='t01jqrjh3yrbzpd79q1nprcrbjy' with audio length 4096 appending audio 57388 [0.469s] Received part_id=0 (chunk_id=14) for take_id='t01jqrjh3yrbzpd79q1nprcrbjy' with audio length 4096 appending audio 61484 [0.479s] Received part_id=0 (chunk_id=15) for take_id='t01jqrjh3yrbzpd79q1nprcrbjy' with audio length 4096 appending audio 65580 [0.489s] Received part_id=0 (chunk_id=16) for take_id='t01jqrjh3yrbzpd79q1nprcrbjy' with audio length 4096 appending audio 69676 [0.500s] Received part_id=0 (chunk_id=17) for take_id='t01jqrjh3yrbzpd79q1nprcrbjy' with audio length 4096 appending audio 73772 [0.510s] Received part_id=0 (chunk_id=18) for take_id='t01jqrjh3yrbzpd79q1nprcrbjy' with audio length 4096 appending audio 77868 [0.520s] Received part_id=0 (chunk_id=19) for take_id='t01jqrjh3yrbzpd79q1nprcrbjy' with audio length 4096 appending audio 81964 [0.530s] Received part_id=0 (chunk_id=20) for take_id='t01jqrjh3yrbzpd79q1nprcrbjy' with audio length 4096 appending audio 86060 [0.540s] Received part_id=0 (chunk_id=21) for take_id='t01jqrjh3yrbzpd79q1nprcrbjy' with audio length 4096 appending audio 90156 [0.550s] Received part_id=0 (chunk_id=22) for take_id='t01jqrjh3yrbzpd79q1nprcrbjy' with audio length 4096 appending audio 94252 [0.560s] Received part_id=0 (chunk_id=23) for take_id='t01jqrjh3yrbzpd79q1nprcrbjy' with audio length 4096 appending audio 98348 [0.570s] Received part_id=0 (chunk_id=24) for take_id='t01jqrjh3yrbzpd79q1nprcrbjy' with audio length 4096 appending audio 102444 [0.580s] Received part_id=0 (chunk_id=25) for take_id='t01jqrjh3yrbzpd79q1nprcrbjy' with audio length 4096 appending audio 106540 [0.590s] Received part_id=0 (chunk_id=26) for take_id='t01jqrjh3yrbzpd79q1nprcrbjy' with audio length 4096 appending audio 110636 [0.600s] Received part_id=0 (chunk_id=27) for take_id='t01jqrjh3yrbzpd79q1nprcrbjy' with audio length 4096 appending audio 114732 [0.610s] Received part_id=0 (chunk_id=28) for take_id='t01jqrjh3yrbzpd79q1nprcrbjy' with audio length 4096 appending audio 118828 [0.620s] Received part_id=0 (chunk_id=29) for take_id='t01jqrjh3yrbzpd79q1nprcrbjy' with audio length 2048 appending audio 120876 [0.621s] Received part_id=0 (chunk_id=30) for take_id='t01jqrjh3yrbzpd79q1nprcrbjy' with audio length (empty -- done receiving) part done Read 120876 bytes of wav data, wrote "websocket_part1.wav". Status.PROGRESS_50 [0.976s] Received part_id=1 (chunk_id=0) for take_id='t01jqrjh3yrbzpd79q1nprcrbjy' with audio length 4140 appending audio 4140 [0.985s] Received part_id=1 (chunk_id=1) for take_id='t01jqrjh3yrbzpd79q1nprcrbjy' with audio length 4096 appending audio 8236 [0.995s] Received part_id=1 (chunk_id=2) for take_id='t01jqrjh3yrbzpd79q1nprcrbjy' with audio length 4096 appending audio 12332 [1.004s] Received part_id=1 (chunk_id=3) for take_id='t01jqrjh3yrbzpd79q1nprcrbjy' with audio length 4096 appending audio 16428 [1.014s] Received part_id=1 (chunk_id=4) for take_id='t01jqrjh3yrbzpd79q1nprcrbjy' with audio length 4096 appending audio 20524 [1.024s] Received part_id=1 (chunk_id=5) for take_id='t01jqrjh3yrbzpd79q1nprcrbjy' with audio length 4096 appending audio 24620 [1.034s] Received part_id=1 (chunk_id=6) for take_id='t01jqrjh3yrbzpd79q1nprcrbjy' with audio length 4096 appending audio 28716 [1.044s] Received part_id=1 (chunk_id=7) for take_id='t01jqrjh3yrbzpd79q1nprcrbjy' with audio length 4096 appending audio 32812 [1.054s] Received part_id=1 (chunk_id=8) for take_id='t01jqrjh3yrbzpd79q1nprcrbjy' with audio length 4096 appending audio 36908 [1.064s] Received part_id=1 (chunk_id=9) for take_id='t01jqrjh3yrbzpd79q1nprcrbjy' with audio length 4096 appending audio 41004 [1.074s] Received part_id=1 (chunk_id=10) for take_id='t01jqrjh3yrbzpd79q1nprcrbjy' with audio length 4096 appending audio 45100 [1.084s] Received part_id=1 (chunk_id=11) for take_id='t01jqrjh3yrbzpd79q1nprcrbjy' with audio length 4096 appending audio 49196 [1.095s] Received part_id=1 (chunk_id=12) for take_id='t01jqrjh3yrbzpd79q1nprcrbjy' with audio length 4096 appending audio 53292 [1.105s] Received part_id=1 (chunk_id=13) for take_id='t01jqrjh3yrbzpd79q1nprcrbjy' with audio length 1024 appending audio 54316 [1.105s] Received part_id=1 (chunk_id=14) for take_id='t01jqrjh3yrbzpd79q1nprcrbjy' with audio length (empty -- done receiving) part done Read 54316 bytes of wav data, wrote "websocket_part2.wav". [1.105s] Received part_id=2 (chunk_id=0) for take_id='t01jqrjh3yrbzpd79q1nprcrbjy' with audio length (empty -- done receiving) stream done Status.READY Deleting take t01jqrjh3yrbzpd79q1nprcrbjy: True examples/websocket_example_async.py import sys, os, asyncio, time from typing import Optional from daisys import DaisysAPI from daisys.v1.speak import (DaisysWebsocketGenerateError, HTTPStatusError, Status, TakeResponse, StreamOptions, StreamMode) # Override DAISYS_EMAIL and DAISYS_PASSWORD with your details! EMAIL = os.environ.get('DAISYS_EMAIL', 'user@example.com') PASSWORD = os.environ.get('DAISYS_PASSWORD', 'pw') # Please see tokens_example.py for how to use an access token instead of a password. async def main(chunks): async with DaisysAPI('speak', email=EMAIL, password=PASSWORD) as speak: print('Found Daisys Speak API', await speak.version()) # A buffer to receive parts; we initialize with a single empty bytes() # because we will use it to accumulate chunks of the current wav file # there. In total we will end with a list of wav files, one for each # part. Parts are bits of speech, usually full sentences, that end with # silence. audio_wavs = [bytes()] # Assume at least one voice is available voice = (await speak.get_voices())[0] async with speak.websocket(voice_id=voice.voice_id) as ws: # Flags we can use to only wait on our one take request; we wait # until the take is READY, and we also wait until we are done # receive all audio parts. done = False ready = False # Time the latency from when we submit the request until each part # is received. t0 = time.time() # The audio callback receives "parts" consisting of audio .wav files # with WAV headers on each part. Depending on the stream settings, # the file may be divided into chunks, where chunk_id==None indicates # the last chunk of a part. If audio==None, then no more parts will # arrive for that take_id. async def audio_cb(request_id: int, take_id: str, part_id: int, chunk_id: int|None, audio: bytes|None): nonlocal done # Report timing info and function arguments print(f'[{time.time()-t0:0.3f}s] Received {part_id=} ({chunk_id=}) for {take_id=} ' 'with audio length', len(audio) if audio else '(empty -- done receiving)') # We only requested one take_id; the take_id is generated by the # Daisys API, so we do not know it until the first status # message arrives. Therefore we can check that the request_id # is the expected one. assert request_id == generate_request_id assert generated_take is None or take_id == generated_take.take_id if audio is None: # If stream is done for this part if chunk_id in [0, None]: print('stream done') # If we have any audio data, write out the last file if len(audio_wavs[-1]) > 0: with open(f'websocket_part{len(audio_wavs)}.wav', 'wb') as f: f.write(audio_wavs[-1]) print(f'Read {len(audio_wavs[-1])} bytes of wav data, wrote "{f.name}".') # Flag that we are done receiving audio done = True # If we are receiving the last chunk of a part elif chunk_id > 0: print('part done') # Write out the part with open(f'websocket_part{len(audio_wavs)}.wav', 'wb') as f: f.write(audio_wavs[-1]) print(f'Read {len(audio_wavs[-1])} bytes of wav data, wrote "{f.name}".') # Start a new part audio_wavs.append(bytes()) # Otherwise append the chunk. else: audio_wavs[-1] = audio_wavs[-1] + audio print('appending audio', len(audio_wavs[-1])) # If non-chunked stream, the part is ended immediately if chunk_id is None: # If we have any audio data, write out the file with open(f'websocket_part{len(audio_wavs)}.wav', 'wb') as f: f.write(audio_wavs[-1]) print(f'Read {len(audio_wavs[-1])} bytes of wav data, wrote "{f.name}".') # Start a new part audio_wavs.append(bytes()) # The status callback is called every time the take's status # changes. Here we use it to end the update loop. async def status_cb(request_id: int, take: TakeResponse): nonlocal ready, generated_take assert request_id == generate_request_id generated_take = take print(take.status) if take.status == Status.READY: ready = True # Submit a request to generate a take over the websocket connection. generate_request_id = await ws.generate_take( voice_id=voice.voice_id, text='Hello from Daisys websockets! How may I help you?', status_callback=status_cb, audio_callback=audio_cb, # Optional stream_options=StreamOptions(mode=StreamMode.CHUNKS) if chunks else None, ) # Will be filled in by callbacks. On submitting the generate # request, we do not yet know what take_id will be assigned so we # must discover it by means of the status callback. generated_take = None # We loop on the websocket while waiting 5 seconds between updates, # and end when the take as been set to READY and all audio has been # received. This update waits 1 second by default, here we set to 5 # seconds, but it can also wait forever by setting timeout to None # or be made a non-blocking operation by setting timeout to 0. # (Important: in async client, timeout=0 leads to TimeoutError, it # cannot be used for non-blocking operations with asyncio.) while not (ready and done) and (time.time() - t0) < 60: try: await ws.update(timeout=5) except DaisysWebsocketGenerateError as e: # As opposed to other websocket errors, if a generate error # occurs it does not necessarily mean we want to close the # stream. print(e) # In this example, however, we actually do, because we only # requested a single take, so stop here. break # Delete the take if generated_take: print(f'Deleting take {generated_take.take_id}:', await speak.delete_take(generated_take.take_id)) if __name__=='__main__': try: asyncio.run(main(chunks='--chunks' in sys.argv[1:])) except HTTPStatusError as e: try: print(f'HTTP error status {e.response.status_code}: {e.response.json()["detail"]}, {e.request.url}') except: print(f'HTTP error status {e.response.status_code}: {e.response.text}, {e.request.url}') Example: Websocket example, asynchronous client with iterator This example shows: 1. How to open a websocket connection using an async context manager. 2. Generate a take. 3. How to iterate over the resulting status and audio messages using iter_request(). 4. How to make a request with and without chunks enabled. (Add argument --chunks.) Example output $ python3 -m examples.websocket_example_async_iter Found Daisys Speak API version=1 minor=0 [0.002] Take status was changed to: WAITING. [0.024] Take status was changed to: STARTED. [0.761] New part being received. [0.761] Received audio chunk of size 245760. [1.197] Take status was changed to: PROGRESS_50. [1.200] New part being received. [1.200] Received audio chunk of size 108544. [2.595] Take status was changed to: READY. Deleting take t01jqrj756qrvrqaw59zgyxpcrw: True Example output (chunks enabled) $ python3 -m examples.websocket_example_async_iter --chunks Found Daisys Speak API version=1 minor=0 [0.002] Take status was changed to: WAITING. [0.023] Take status was changed to: STARTED. [0.318] New part being received. [0.318] Received audio chunk of size 4096. [0.331] Received audio chunk of size 4096. [0.344] Received audio chunk of size 4096. [0.358] Received audio chunk of size 4096. [0.371] Received audio chunk of size 4096. [0.384] Received audio chunk of size 4096. [0.397] Received audio chunk of size 4096. [0.411] Received audio chunk of size 4096. [0.424] Received audio chunk of size 4096. [0.437] Received audio chunk of size 4096. [0.450] Received audio chunk of size 4096. [0.463] Received audio chunk of size 4096. [0.472] Received audio chunk of size 4096. [0.482] Received audio chunk of size 4096. [0.492] Received audio chunk of size 4096. [0.503] Received audio chunk of size 4096. [0.513] Received audio chunk of size 4096. [0.523] Received audio chunk of size 4096. [0.533] Received audio chunk of size 4096. [0.543] Received audio chunk of size 4096. [0.553] Received audio chunk of size 4096. [0.564] Received audio chunk of size 4096. [0.575] Received audio chunk of size 4096. [0.584] Received audio chunk of size 4096. [0.595] Received audio chunk of size 4096. [0.605] Received audio chunk of size 4096. [0.615] Received audio chunk of size 4096. [0.626] Received audio chunk of size 4096. [0.636] Received audio chunk of size 4096. [0.646] Received audio chunk of size 1024. [1.002] Take status was changed to: PROGRESS_50. [1.005] New part being received. [1.005] Received audio chunk of size 4096. [1.015] Received audio chunk of size 4096. [1.024] Received audio chunk of size 4096. [1.033] Received audio chunk of size 4096. [1.043] Received audio chunk of size 4096. [1.053] Received audio chunk of size 4096. [1.064] Received audio chunk of size 4096. [1.074] Received audio chunk of size 4096. [1.084] Received audio chunk of size 4096. [1.095] Received audio chunk of size 4096. [1.105] Received audio chunk of size 4096. [1.115] Received audio chunk of size 4096. [1.125] Received audio chunk of size 4096. [1.135] Received audio chunk of size 2048. [2.641] Take status was changed to: READY. Deleting take t01jqrk04k7fhrdgs764bv6h7p1: True examples/websocket_example_async_iter.py import sys, os, asyncio, time from typing import Optional from daisys import DaisysAPI from daisys.v1.speak import (DaisysWebsocketGenerateError, HTTPStatusError, Status, TakeResponse, StreamOptions, StreamMode) # Override DAISYS_EMAIL and DAISYS_PASSWORD with your details! EMAIL = os.environ.get('DAISYS_EMAIL', 'user@example.com') PASSWORD = os.environ.get('DAISYS_PASSWORD', 'pw') # Please see tokens_example.py for how to use an access token instead of a password. async def main(chunks): async with DaisysAPI('speak', email=EMAIL, password=PASSWORD) as speak: print('Found Daisys Speak API', await speak.version()) # A buffer to receive parts; we initialize with a single empty bytes() # because we will use it to accumulate chunks of the current wav file # there. In total we will end with a list of wav files, one for each # part. Parts are bits of speech, usually full sentences, that end with # silence. audio_wavs = [bytes()] # Assume at least one voice is available voice = (await speak.get_voices())[0] async with speak.websocket(voice_id=voice.voice_id) as ws: # Time the latency from when we submit the request until each part # is received. t0 = time.time() # Submit a request to generate a take over the websocket connection. generate_request_id = await ws.generate_take( voice_id=voice.voice_id, text='Hello from Daisys websockets! How may I help you?', # Optional stream_options=StreamOptions(mode=StreamMode.CHUNKS) if chunks else None, ) # The use of an interator simplifies streaming, here we show how to # get both status and audio chunks from the same iterator. async for take_id, take, header, audio in ws.iter_request(generate_request_id): now = time.time() - t0 if take is not None: print(f'[{now:.03f}] Take status was changed to: {take.status.name}.') if header is not None: print(f'[{now:.03f}] New part being received.') if audio is not None: print(f'[{now:.03f}] Received audio chunk of size {len(audio)}.') # Delete the take if take_id: print(f'Deleting take {take_id}:', await speak.delete_take(take_id)) if __name__=='__main__': try: asyncio.run(main('--chunks' in sys.argv[1:])) except HTTPStatusError as e: try: print(f'HTTP error status {e.response.status_code}: {e.response.json()["detail"]}, {e.request.url}') except: print(f'HTTP error status {e.response.status_code}: {e.response.text}, {e.request.url}') Example: Websocket example, web client This example shows how to use the Python client library to create a FastAPI-based web server that performs login to the Daisys API (so that credentials are kept secure) and interacts with the REST API to retrieve a websocket URL. This URL is passed to the front-end JavaScript application that makes the websocket connection and makes take requests, playing back the audio in a streaming fashion using the Web Audio API. The included JavaScript shows how to: 1. Retrieve the URL using websocket_url() and open the websocket connection, automatically doing so again when disconnected, see websocket_connector.js. 2. Define an async iterator that simplifies handling of streams for different requests by transforming the callback structure into a simple for loop, see websocket_stream.js and usage in websocket_client.js. 3. Send a request for generating a take, see websocket_client.js. 4. Use a single handler to handle incoming status messages and audio messages in both “parts” and “chunks” mode, see websocket_stream.js. 5. Respectively play the audio in a simple (parts, using audio sources) and more complex (chunks, using dynamic audio buffers) way using the Web Audio API, see part_audio_player.js and chunk_audio_player.js. The above consists of several files, as opposed to other examples, so instead of repeating the example in the documentation, the reader is invited to follow the code in the git repository. The application can be launched using: python3 -m examples.websocket_client or equivalently: uvicorn examples.websocket_client:app Example: Using an access token instead of password This example shows: 1. How to initially retrieve access and refresh tokens. 2. How to use them and store them if they change (are refreshed). Example output $ python3 -m examples.tokens_example examples/tokens_example.py import os, asyncio, json from daisys import DaisysAPI from daisys.v1.speak import VoiceGender, SimpleProsody, DaisysTakeGenerateError, HTTPStatusError # This example shows how to use access and refresh tokens instead of a username and email # address for authenticating with the Daisys API. # Override DAISYS_EMAIL and DAISYS_PASSWORD with your details! EMAIL = os.environ.get('DAISYS_EMAIL', 'user@example.com') PASSWORD = os.environ.get('DAISYS_PASSWORD', 'pw') def load_tokens(): """A function to access and refresh tokens from a local file. In practice you might store this somewhere more global like in a database, to re-use between sessions.""" try: with open('daisys_tokens.json') as tokens_file: tokens = json.load(tokens_file) print('Loaded tokens from "daisys_tokens.json".') return tokens['access_token'], tokens['refresh_token'] except (FileNotFoundError, json.JSONDecodeError): return None, None ACCESS_TOKEN, REFRESH_TOKEN = load_tokens() def store_tokens(access_token: str, refresh_token: str): """A function to store the access and refresh tokens to a local file. In practice you might store this somewhere like in a database or larger configuration. """ with open('daisys_tokens.json', 'w') as tokens_file: json.dump({'access_token': access_token, 'refresh_token': refresh_token}, tokens_file) print('Stored new tokens in "daisys_tokens.json".') def initial_login(): """Initially retrieve access and refresh tokens through a normal login.""" # Initial login is only required if we don't have an access token yet. print(f'Initial login, attempting to log in with {EMAIL} to retrieve an access token.') with DaisysAPI('speak', email=EMAIL, password=PASSWORD) as speak: # Say what should happen when tokens are retrieved or changed. speak.token_callback = store_tokens # Explicit login is only necessary if no other operations occur here, otherwise # login is automatic. speak.login() # Login enables auto-logout. Disable it so that the token will not be invalidated. speak.auto_logout = False print('Run again to use stored access token!') def subsequent_login(): """In subsequent uses, the previously stored access token can be used directly.""" print('Using previously stored access token.') with DaisysAPI('speak', access_token=ACCESS_TOKEN, refresh_token=REFRESH_TOKEN) as speak: speak.token_callback = store_tokens # Here we are just showing how to authenticate with the Daisys API using an access # token instead of the password, so we just list some voices. See hello_daisys.py # for an example of how to generate audio! # Get a list of all voices voices = speak.get_voices() print('Found voices:', [voice.name for voice in voices]) def main(): if ACCESS_TOKEN: subsequent_login() else: initial_login() if __name__=='__main__': try: main() except HTTPStatusError as e: try: print(f'HTTP error status {e.response.status_code}: {e.response.json()["detail"]}, {e.request.url}') except: print(f'HTTP error status {e.response.status_code}: {e.response.text}, {e.request.url}') Example: curl example This example shows: 1. How to use curl and jq in a shell script to access the Daisys API. 2. How to create the synchronous client using a context manager. 3. Get a list of voices and select the last one. 4. Reference the voice to generate audio (a “take”) for some text. 5. Download the resulting audio. 6. Play the audio using aplay (Linux). To run it, you must supply your email and password in the respecting environment variables, as shown below. Requires: curl, jq. Example output $ curl -O https://raw.githubusercontent.com/daisys-ai/daisys-api-python/main/examples/curl_example.sh $ jq --version # "jq" is needed for the example program to parse API responses $ export DAISYS_EMAIL=user@example.com $ export DAISYS_PASSWORD=example_password123 $ bash examples/curl_example.sh Found Daisys Speak API {"version":1,"minor":0} GET https://api.daisys.ai/v1/speak/voices "Deirdre" is speaking! POST https://api.daisys.ai/v1/speak/takes/generate: {"voice_id": "v01hasgezqjcsnc91zdfzpx0apj", "text": "Hello there, I am Daisys!", "prosody": {"pace": -8, "pitch": 2, "expression": 8}} Take is "waiting". GET https://api.daisys.ai/v1/speak/takes/t01hawm80qzj60bf2w9z0np7wej Take is "started". GET https://api.daisys.ai/v1/speak/takes/t01hawm80qzj60bf2w9z0np7wej Take is "ready". Getting audio! GET https://api.daisys.ai/v1/speak/takes/t01hawm80qzj60bf2w9z0np7wej/wav Wrote 'hello_daisys.wav'. Playing WAVE 'hello_daisys.wav' : Signed 16 bit Little Endian, Rate 44100 Hz, Mono The “Playing” message will only appear if you have the aplay command installed, otherwise you may play the resulting hello_daisys.wav file in any audio player. examples/curl_example.sh #!/bin/bash # The following is an example of how to use the Daisys API for generating a voice and then # using it in a speech generation task using the "curl" program. The API generates # "takes" representing one or more sentences from a speaker. # This program downloads the resulting .wav file and tries to play it using "aplay" if # that program is available. set -e # Stop if we hit any problems along the way. EMAIL="${DAISYS_EMAIL:=user@example.com}" PASSWORD="${DAISYS_PASSWORD:=example_password}" DAISYS_AUTH="${DAISYS_AUTH_URL:=https://api.daisys.ai}" DAISYS="${DAISYS_API_URL:=https://api.daisys.ai}" API="$DAISYS/v1" SPEAK="$API/speak" TOKEN=$(curl -s -X POST -H 'Content-Type: application/json' -d '{"email": "'$EMAIL'", "password": "'$PASSWORD'"}' $DAISYS_AUTH/auth/login | jq -r ".access_token") AUTH="Authorization: Bearer $TOKEN" # Some functions for authenticated GET and POST methods using curl. speak_get() { echo "GET $SPEAK/$1" >/dev/stderr curl -s -L -H "$AUTH" "$SPEAK/$1" } speak_post() { echo "POST $SPEAK/$1: $2" >/dev/stderr curl -s -H "Content-Type: application/json" -H "$AUTH" -d "$2" "$SPEAK/$1" } VERSION=$(curl -s $API/speak/version) echo 'Found Daisys Speak API ' $VERSION # Get a list of all voices, select the last one. VOICE=$(speak_get voices | jq '.[-1]') if [ "$VOICE" = null ]; then echo No voices found. MODEL=$(speak_get models | jq '.[-1]') if [ "$MODEL" = null ]; then echo No models found. exit 1 fi echo Using model $(echo $MODEL | jq .displayname) echo Generating a voice. VOICE=$(speak_post voices/generate '{"name": "Tina", "gender": "female", "model": '$(echo $MODEL | jq .name)'}') fi echo "$(echo $VOICE | jq .name) is speaking!" VOICE_ID="$(echo $VOICE | jq .voice_id)" TAKE=$(speak_post takes/generate '{"voice_id": '$VOICE_ID', "text": "Hello there, I am Daisys!", "prosody": {"pace": -8, "pitch": 2, "expression": 8}}') TAKE_ID="$(echo $TAKE | jq -r .take_id)" echo "Take is $(echo $TAKE | jq .status)." while [ $(echo $TAKE | jq -r .status) != 'ready' ] && [ $(echo $TAKE | jq -r .status) != 'error' ]; do sleep 0.5 TAKE=$(speak_get takes/$TAKE_ID) echo "Take is $(echo $TAKE | jq .status)." done echo "Getting audio!" speak_get takes/$TAKE_ID/wav > hello_daisys.wav echo "Wrote 'hello_daisys.wav'." # Play the audio if we have aplay (Linux), otherwise just print a nice message. if which aplay >/dev/null; then aplay hello_daisys.wav else echo "aplay not found, but audio was written to 'hello_daisys.wav'." fi Daisys API input The following examples can be used to see how the Daisys API client library for Python can be used. For generating speech audio, the Daisys API supports input text that can include certain directives. Input Text Customization Our models employ a powerful Normalizer (Advanced Text and SSML Tag Processing tool) designed to process and normalize text, making it more readable and coherent. It is equipped with a default pipeline of operations to apply on the input text, but it also allows for customizing the normalization process according to specific needs. The Normalizer applies a series of pre-defined steps by default to transform the text: Default pipeline: 1. Abbreviations: Converts common abbreviations like “Mr.” to their full spoken versions (e.g., “Mister”). 2. Acronyms: Acronyms will be processed and insert small pause between letter for better pronunciation (e.g., “10 AM” to “10 A M”). 3. Road numbers: Converts road numbers by inserting a space, similar to acronyms (e.g. “A263” becomes “A 263”). 4. URLs: Replaces URLs with a more human-readable description. (e.g. “As an example of https://google.com” to “As an example of google dot com.”) 5. Numbers: Converts numerical expressions to their spoken form (e.g., “10” to “ten”, “$40” to “forty dollars”). 6. Punctuation Repeats: Simplifies repeated punctuation (e.g., “Hey!!!!” to “Hey!”). 7. Units: Converts units such as “km/h” to their spoken form (“kilometer(s) per hour”). Advanced Capabilities: The Normalizer also provides advanced functionalities to handle SSML tags. It follows SSML is based on the World Wide Web Consortium’s “Speech Synthesis Markup Language Version 1.0”. Partially supports voice , phoneme and say-as tags. Voice Tag The voice tag is used when a specific text segment comes from a different language than the base language of that model. Our advanced automatic language prediction algorithm captures these parts and inserts a voice tag with the proper language attribute. This allows the model to apply required language change for that section. This tag can be manually added to input text. Normalization will be applied based on the defined language for voice tag section. Example usage: Input: The parking season ticket was valid t/m 09-01-2010. Normalizer output: The parking season ticket was valid tot en met negen januari tweeduizend tien. Phoneme Tag The phoneme tag provides control over the phonemization for the model. The model will use this pronunciation. Example usage: De gekleurde vlag van een land. 📌 Note: Phonemes need to be separated by a space. In case of multiple words, they should be separated by the @ symbol (e.g. De ernstige kapitein → d ə @ ɛ r n s t ə ɣ ə @ k ɑ p i t ɛɪ n) Say-as Tag The say-as tags allow users to interpret some specific types of text in a certain way. Supported attributes: 1. spell-out 2. year 3. date 4. time Example usage: Input: Mijn naam spel je als Fred. Het was 1944. Ik vertrek om 13.10. Ik ben geboren op 11.4.1984. Output: Mijn naam spel je als F r e d. Het was negentien vierenveertig. Ik vertrek om tien over één. Ik ben geboren op elf april negentien vierentachtig. w Tag The tag allows the user to select the correct pronunciation for a word based on the part of speech and meaning. The part-of-speech tags from the [Penn Treebank](https://www.ling.upenn.edu/courses/Fall_2003/ling001/penn_treebank_pos.html) are used. Example options: * read : verb, present tense * read : verb, past tense * wind : noun * live : adjective * live : adverb * bass : default meaning/pronunciation (in the example: the music-related sense) * bass : first non-default meaning/pronunciation (in the example: the fish) Emphasis Tag The tag allows to select a word for emphasis. Intonation and duration behaviour of the model will be modified in the text between the tags. The strength of the emphasis can be modulated by the level attribute, which by default is moderate but can take the following values: * some text: somewhat emphasize the text * some text: more strongly emphasize the text * some text: emphasize the text by going down in pitch * some text: avoid that the model automatically emphasizes this text Note that the model will often choose which parts of a sentence to emphasize depending on context if no hints are provided, so this level of control is critical if you want to avoid that the wrong word is selected or the ensure the right word is selected for emphasis. The pause attribute may also be added to insert a pause after the word which can enhance emphasis, it takes the same values as the strength attribute of the tag: * some text: somewhat emphasize the text, including a short pause * some text: strongly emphasize the text, including a medium pause * some text: somewhat emphasize the text, including a long pause Break Tag The tag inserts a pause at a given place in the text. The duration of the pause can be controlled by the strength attribute: * : a short pause, such as after a comma * : a medium-length pause, such as between sentences * : a longer pause, such as between paragraphs Daisys API top-level object class daisys.factory.DaisysAPI(product='speak', version='v1', email: str | None = None, password: str | None = None, access_token: str | None = None, refresh_token: str | None = None, daisys_url: str = 'https://api.daisys.ai', auth_url: str = 'https://api.daisys.ai') Factory class to get a Daisys API client. This class is intended (but not required) to be used in a with or async with clause. __init__(product='speak', version='v1', email: str | None = None, password: str | None = None, access_token: str | None = None, refresh_token: str | None = None, daisys_url: str = 'https://api.daisys.ai', auth_url: str = 'https://api.daisys.ai') Initialize the factory object for a given product and version. This object is intended to be short lived, only used to provide a client object. Login or token details be be optionally provided. If they are not provided here, they may be later provided to the client by calling client.login(). Parameters: * product – The product to retrieve a client for. * version – The version of the product to retrieve a client for. * email – Optionally, email to use for logging in. * password – Optionally, password to use for logging in. * access_token – Optionally, access token to use. Specify if login was already performed. * refresh_token – Optionally, refresh token to use. Specify if login was already performed. * daisys_url – For overriding default API URL, usually not needed. * auth_url – For overriding default authentication URL, usually not needed. get_async_client() → DaisysAsyncSpeakClientV1 Retrieve a client for asynchronous usage of the Daisys Speak API. get_client() → DaisysSyncSpeakClientV1 Retrieve a client for synchronous usage of the Daisys Speak API. Daisys API clients These objects should not be instantiated directly, but accessed through the DaisysAPI top-level factory object. The synchronous client makes requests using synchronous, blocking calls. The asynchronous client uses an asyncio event loop. You should choose whichever implementation is most convenient for your application. class daisys.v1.speak.sync_client.DaisysSyncSpeakClientV1(auth_url: str, product_url: str, email: str, password: str, access_token: str, refresh_token: str) Wrapper for Daisys v1 API endpoints, synchronous version. close() To be called when object is destroyed, to ensure any open HTTP connections are cleanly closed. This is done automatically if the client was created through a context manager. delete_take(take_id: str, raise_on_error: bool = True) → bool Delete a take. The take will no longer appear in return values from get_takes. Parameters: * take_id – the id of a take to delete. * raise_on_error – If True (default) a DaisysTakeDeletionException error will be raised if the take was not found. (That is, if the function would have returned False.) Returns: True if the take was deleted successfully, otherwise False. Return type: bool Note that HTTP exceptions may be thrown for errors other than a take not being found. delete_voice(voice_id: str, raise_on_error: bool = True) → bool Delete a voice. The voice will no longer appear in return values from get_voices. Parameters: * voice_id – the id of a voice to delete. * raise_on_error – If True (default) a DaisysVoiceDeletionException error will be raised if the voice was not found. (That is, if the function would have returned False.) Returns: True if the voice was deleted successfully, otherwise False. Return type: bool Note that HTTP exceptions may be thrown for errors other than a voice not being found. generate_take(voice_id: str, text: str, override_language: str | None = None, style: list[str] | None = None, prosody: SimpleProsody | AffectProsody | SignalProsody | None = None, status_webhook: str | None = None, done_webhook: str | None = None, wait: bool = True, raise_on_error: bool = True, timeout: float | None = None) → TakeResponse Generate a “take”, an audio file containing an utterance of the given text by the given voice. Parameters: * voice_id – The id of the voice to be used for generating audio. The voice is attached to a specific model. * text – The text that the voice should say. * override_language – Normally a language classifier is used to detect the language of the speech; this allows for multilingual sentences. However, if the language should be enforced, it should be provided here. Currently accepted values are “nl-NL” and “en-GB”. * style – A list of styles to enable when speaking. Note that most styles are mutually exclusive, so a list of 1 value should be provided. Accepted styles can be retrieved from the associated voice’s VoiceInfo.styles or the model’s TTSModel.styles field. Note that not all models support styles, thus this can be left empty if specific styles are not desired. * prosody – The characteristics of the desired speech not determined by the voice or style. Here you can provide a SimpleProsody or most models also accept the more detailed AffectProsody. * status_webhook – An optional URL to be called using POST whenever the take’s status changes, with TakeResponse in the body content. * done_webhook – An optional URL to be called exactly once using POST when the take is READY, ERROR, or TIMEOUT, with TakeResponse in the body content. * wait – if True, wait for take to be ready before returning. * raise_on_error – If True (default) a DaisysTakeGeenerateException error will be raised. If this behavior is not desired, set to False. * timeout – Time limit to wait, in seconds. Note that if timeout is specified, some results may not have a “done” status (ready or error). Returns: Information about the take being generated, including status. Return type: TakeResponse generate_takes(request: list[TakeGenerate], wait: bool = True, raise_on_error: bool = True, timeout: float | None = None) → list[TakeResponse] Generate several “takes”, each corresponding to an audio file containing an utterance of the given text by the given voice. Parameters: * request – a list of list[TakeGenerate] objects describing multiple take generation requests. * wait – if True, wait for all takes to be ready before returning. * raise_on_error – If True (default) a DaisysTakeGeenerateException error will be raised. If this behavior is not desired, set to False. * timeout – Time limit to wait, in seconds. Note that if timeout is specified, some results may not have a “done” status (ready or error). Returns: a list of TakeResponse objects containing information on the generation status of each result. Return type: list[TakeResponse] generate_voice(name: str, model: str, gender: VoiceGender, description: str | None = None, default_style: list[str] | None = None, default_prosody: SimpleProsody | AffectProsody | SignalProsody | None = None, example_take: TakeGenerateWithoutVoice | None = None, done_webhook: str | None = None, wait: bool = True, raise_on_error: bool = True, timeout: float | None = None) → VoiceInfo Generate a random, novel voice for a given model with desired properties. Parameters: * name – A name to give the voice, may be any string, and does not need to be unique. * model – The name of the model for this voice. * gender – The gender of this voice. * description – The description of this voice. * default_style – An optional list of styles to associate with this voice by default. It can be overriden by a take that uses this voice. Note that most styles are mutually exclusive, and not all models support styles. * default_prosody – An optional default prosody to associate with this voice. It can be overridden by a take that uses this voice. * example_take – Information on the take to generate as an example of this voice. * done_webhook – An optional URL to call exactly once using POST when the voice is available, with VoiceInfo in the body content. * wait – True to wait for the result, or False to continue without waiting. * raise_on_error – If True (default) a DaisysVoiceGenerateException error will be raised if an error status is detected in one of the takes. If this behavior is not desired, set to False. * timeout – Time limit to wait, in seconds. Note that if timeout is specified, some results may not have a “done” status (ready or error). Returns: Information about the generated voice. Return type: VoiceInfo get_model(model_name: str) → TTSModel Get information about a model. Parameters: model_name – The name of the model, which is a unique identifier. Returns: Information about the model. Return type: TTSModel get_models() → list[TTSModel] Get information about all available models. Returns: Information about each model. Return type: list[TTSModel] get_take(take_id: str) → TakeResponse Get information about a specific take. Parameters: take_id – Unique identifier for a take. Returns: Information about the requested take. Return type: TakeResponse get_take_audio(take_id: str, file: str | None = None, format: str = 'wav') → bytes Get audio associated with a take. Parameters: * take_id – A take_id to retrieve the audio for. * file – Optionally, the filename of a file to write, or a file stream to write to. * format – A supported format, must be one of ‘wav’, ‘mp3’, ‘flac’, ‘m4a’. Note: only ‘wav’ may be retrieved without waiting for ‘ready’ status. Returns: The content of the audio file associated with the requested take. Return type: bytes get_take_audio_url(take_id: str, format: str = 'wav') → str Get the signed URL for audio associated with a take. May be used to provide the URL to a download or streaming client that does not have the API access token. Parameters: * take_id – A take_id to retrieve the audio URL for. * format – A supported format, msut be one of ‘wav’, ‘mp3’, ‘flac’, ‘m4a’. Note: only ‘wav’ may be retrieved without waiting for ‘ready’ status. Returns: The URL that can be used to download the content of the audio associated with the requested take. Return type: str get_takes(take_ids: list[str] | None = None, length: int | None = None, page: int | None = None, older: int | None = None, newer: int | None = None) → list[TakeResponse] Get a list of takes, optionally filtered. Parameters: * take_ids – A list of specific takes to retrieve. * length – Maximum number of voices to return. Default: unlimited. * page – Return page “page” of length “length”. Default: 1. * older – Find voices older than or equal to this timestamp (milliseconds). * newer – Find voices newer than or equal to this timestamp (milliseconds). Returns: Information about each take found. Empty list if none found. Return type: list[TakeResponse] get_voice(voice_id: str) → VoiceInfo Get information about a voice. Parameters: voice_id – The unique identififer for a voice. Returns: Information about the voice. Return type: VoiceInfo get_voices(length: int | None = None, page: int | None = None, older: int | None = None, newer: int | None = None) → list[VoiceInfo] Get a list of voices, optionally filtered. Parameters: * length – Maximum number of voices to return. Default: unlimited. * page – Return page “page” of length “length”. Default: 1. * older – Find voices older than or equal to this timestamp (milliseconds). * newer – Find voices newer than or equal to this timestamp (milliseconds). Returns: Information about each voice found. Return type: list[VoiceInfo] login(email: str | None = None, password: str | None = None) → bool Log in to the Daisys API using the provided credentials. If successful, nothing is returned. An access token is stored in the client for use in future requests. May raise: * DaisysCredentialsError: if insufficient credentials are provided. * httpx.HTTPStatusError(401): if credentials do not successfully authenticate. Parameters: * email – User name for the Daisys API credentials. * password – Password for the Daisys API credentials. login_refresh() → bool | None Refresh access and refresh tokens for API authorization. This function does not normally need to be called explicitly, since the authorization credentials shall be renewed automatically when needed, however it is provided in case there is a need to do so explicitly by the user. * httpx.HTTPStatusError(401): if credentials do not successfully authenticate. Returns: True if successful, False if unsuccessful, and None if no refresh token was available. Return type: Optional[bool] logout(refresh_token: str | None = None) → bool Log out of the Daisys API. Revokes the refresh token and and forgets the access and refresh tokens. * httpx.HTTPStatusError(401): if credentials do not successfully authenticate. Note that further requests may auto-login again. Returns: True if logout was successful, False if no tokens were provided to revoke. Return type: bool stream_take_audio(take_id: str) Stream the audio by providing an iterator over chunks of bytes. Parameters: take_id – A take_id to retrieve the audio URL for. Returns: use “for” to read chunks of bytes for this take. Return type: iterator update_voice(voice_id: str, name: str | None = None, gender: VoiceGender | None = None, description: str | None = None, default_style: list[str] | None = None, default_prosody: SimpleProsody | AffectProsody | SignalProsody | None = None, raise_on_error: bool = True, **_kwargs) → bool Update a voice. Parameters: * voice_id – the id of a voice to update. * name – A name to give the voice, may be any string, and does not need to be unique. * gender – The gender of this voice. * description – The description of this voice. * default_style – An optional list of styles to associate with this voice by default. It can be overriden by a take that uses this voice. Note that most styles are mutually exclusive, and not all models support styles. * default_prosody – An optional default prosody to associate with this voice. It can be overridden by a take that uses this voice. * raise_on_error – If True (default) a DaisysVoiceUpdateException error will be raised if the voice was not found. (That is, if the function would have returned False.) Returns: True if the voice was updated successfully, otherwise False. Return type: bool Note that HTTP exceptions may be thrown for errors other than a voice not being found. version() → Version Get the version information for the API in use. Returns: An object containing version information. Return type: Version wait_for_takes(take_ids: str | list[str], sleep_seconds=0.5, callback: Callable[[TakeResponse | list[TakeResponse]], None] | None = None, async_callback: Callable[[TakeResponse | list[TakeResponse]], Awaitable[None]] | None = None, raise_on_error: bool = True, timeout: float | None = None) → TakeResponse | list[TakeResponse] Wait for a take or list of takes to be ready. Parameters: * take_ids – Either a single take_id, or a list of take_id to wait for at the same time. In the latter case, the function will return when all take_id are done. * sleep_seconds – The number of seconds to wait while polling the take status. * callback – A synchronous function to call whenever the status of one of the takes changes. The argument it receives corresponds to a list of all takes requested. (A single take will also be embedded in a list.) * async_callback – An asynchronous function to call whenever the status of one of the takes changes. The argument it receives corresponds to a list of all takes requested. * raise_on_error – If True (default) a DaisysTakeGeenerateException error will be raised if an error status is detected in one of the takes. If this behavior is not desired, set to False. * timeout – Time limit to wait, in seconds. Note that if timeout is specified, some results may not have a “done” status (ready or error). wait_for_voices(voice_ids: str | list[str], sleep_seconds: float = 0.5, raise_on_error: bool = True, timeout: float | None = None) → VoiceInfo | list[VoiceInfo] Wait for a voice or list of voices to be ready. Parameters: * voice_ids – Either a single voice_id, or a list of voice_id to wait for at the same time. In the latter case, the function will return when all voice_id are done. * sleep_seconds – The number of seconds to wait while polling the voice status. * raise_on_error – If True (default) a DaisysVoiceGeenerateException error will be raised if an error status is detected in one of the voices. If this behavior is not desired, set to False. * timeout – Time limit to wait, in seconds. Note that if timeout is specified, some results may not have a “done” status (ready or error). websocket(model: str | None = None, voice_id: str | None = None) → DaisysSyncSpeakWebsocketV1 Get an interface to the websocket that manages the connection, allows making voice generate and take generate reqeusts, and handles streaming the resulting audio. This provided interface is intended to be used in a with clause. Parameters: * model – a websocket connection requires specifying a model or voice * voice_id – if model is not provided, voice_id must be provided Returns: DaisysSyncSpeakWebsocketV1 websocket_url(model: str | None = None, voice_id: str | None = None, raise_on_error: bool = True) → str Get a URL for connecting a websocket. Must specify model or voice_id in order to indicate the principle model to be used on this connection. Parameters: * model – the model for which we want to retrieve a websocket URL. * voice_id – the id of a voice for which we want to retrieve a websocket URL. * raise_on_error – If True (default) an error will be raised if the voice was not found or the URL could not be retrieved. Returns: The URL to connect a websocket to. Return type: str class daisys.v1.speak.async_client.DaisysAsyncSpeakClientV1(auth_url: str, product_url: str, email: str, password: str, access_token: str, refresh_token: str) Wrapper for Daisys v1 API endpoints, asynchronous version. async close() To be called when object is destroyed, to ensure any open HTTP connections are cleanly closed. This is done automatically if the client was created through a context manager. async delete_take(take_id: str, raise_on_error: bool = True) → bool Delete a take. The take will no longer appear in return values from get_takes. Parameters: * take_id – the id of a take to delete. * raise_on_error – If True (default) a DaisysTakeDeletionException error will be raised if the take was not found. (That is, if the function would have returned False.) Returns: True if the take was deleted successfully, otherwise False. Return type: bool Note that HTTP exceptions may be thrown for errors other than a take not being found. async delete_voice(voice_id: str, raise_on_error: bool = True) → bool Delete a voice. The voice will no longer appear in return values from get_voices. Parameters: * voice_id – the id of a voice to delete. * raise_on_error – If True (default) a DaisysVoiceDeletionException error will be raised if the voice was not found. (That is, if the function would have returned False.) Returns: True if the voice was deleted successfully, otherwise False. Return type: bool Note that HTTP exceptions may be thrown for errors other than a voice not being found. async generate_take(voice_id: str, text: str, override_language: str | None = None, style: list[str] | None = None, prosody: SimpleProsody | AffectProsody | SignalProsody | None = None, status_webhook: str | None = None, done_webhook: str | None = None, wait: bool = True, raise_on_error: bool = True, timeout: float | None = None) → TakeResponse Generate a “take”, an audio file containing an utterance of the given text by the given voice. Parameters: * voice_id – The id of the voice to be used for generating audio. The voice is attached to a specific model. * text – The text that the voice should say. * override_language – Normally a language classifier is used to detect the language of the speech; this allows for multilingual sentences. However, if the language should be enforced, it should be provided here. Currently accepted values are “nl-NL” and “en-GB”. * style – A list of styles to enable when speaking. Note that most styles are mutually exclusive, so a list of 1 value should be provided. Accepted styles can be retrieved from the associated voice’s VoiceInfo.styles or the model’s TTSModel.styles field. Note that not all models support styles, thus this can be left empty if specific styles are not desired. * prosody – The characteristics of the desired speech not determined by the voice or style. Here you can provide a SimpleProsody or most models also accept the more detailed AffectProsody. * status_webhook – An optional URL to be called using POST whenever the take’s status changes, with TakeResponse in the body content. * done_webhook – An optional URL to be called exactly once using POST when the take is READY, ERROR, or TIMEOUT, with TakeResponse in the body content. * wait – if True, wait for take to be ready before returning. * raise_on_error – If True (default) a DaisysTakeGeenerateException error will be raised. If this behavior is not desired, set to False. * timeout – Time limit to wait, in seconds. Note that if timeout is specified, some results may not have a “done” status (ready or error). Returns: Information about the take being generated, including status. Return type: TakeResponse async generate_takes(request: list[TakeGenerate], wait: bool = True, raise_on_error: bool = True, timeout: float | None = None) → list[TakeResponse] Generate several “takes”, each corresponding to an audio file containing an utterance of the given text by the given voice. Parameters: * request – a list of list[TakeGenerate] objects describing multiple take generation requests. * wait – if True, wait for all takes to be ready before returning. * raise_on_error – If True (default) a DaisysTakeGeenerateException error will be raised. If this behavior is not desired, set to False. * timeout – Time limit to wait, in seconds. Note that if timeout is specified, some results may not have a “done” status (ready or error). Returns: a list of TakeResponse objects containing information on the generation status of each result. Return type: list[TakeResponse] async generate_voice(name: str, model: str, gender: VoiceGender, description: str | None = None, default_style: list[str] | None = None, default_prosody: SimpleProsody | AffectProsody | SignalProsody | None = None, example_take: TakeGenerateWithoutVoice | None = None, done_webhook: str | None = None, wait: bool | None = True, raise_on_error: bool | None = True, timeout: float | None = None) → VoiceInfo Generate a random, novel voice for a given model with desired properties. Parameters: * name – A name to give the voice, may be any string, and does not need to be unique. * model – The name of the model for this voice. * gender – The gender of this voice. * description – The description of this voice. * default_style – An optional list of styles to associate with this voice by default. It can be overriden by a take that uses this voice. Note that most styles are mutually exclusive, and not all models support styles. * default_prosody – An optional default prosody to associate with this voice. It can be overridden by a take that uses this voice. * example_take – Information on the take to generate as an example of this voice. * done_webhook – An optional URL to call exactly once using POST when the voice is available, with VoiceInfo in the body content. * wait – True to wait for the result, or False to continue without waiting. * raise_on_error – If True (default) a DaisysVoiceGenerateException error will be raised if an error status is detected in one of the takes. If this behavior is not desired, set to False. * timeout – Time limit to wait, in seconds. Note that if timeout is specified, some results may not have a “done” status (ready or error). Returns: Information about the generated voice. Return type: VoiceInfo async get_model(model_name: str) → TTSModel Get information about a model. Parameters: model_name – The name of the model, which is a unique identifier. Returns: Information about the model. Return type: TTSModel async get_models() → list[TTSModel] Get information about all available models. Returns: Information about each model. Return type: list[TTSModel] async get_take(take_id: str) → TakeResponse Get information about a specific take. Parameters: take_id – Unique identifier for a take. Returns: Information about the requested take. Return type: TakeResponse async get_take_audio(take_id: str, file: str | None = None, format: str = 'wav') → bytes Get audio associated with a take. Parameters: * take_id – A take_id to retrieve the audio for. * file – Optionally, the filename of a file to write, or a file stream to write to. * format – A supported format, must be one of ‘wav’, ‘mp3’, ‘flac’, ‘m4a’. Note: only ‘wav’ may be retrieved without waiting for ‘ready’ status. Returns: The content of the audio file associated with the requested take. Return type: bytes async get_take_audio_url(take_id: str, format: str = 'wav') → str Get the signed URL for audio associated with a take. May be used to provide the URL to a download or streaming client that does not have the API access token. Parameters: * take_id – A take_id to retrieve the audio URL for. * format – A supported format, msut be one of ‘wav’, ‘mp3’, ‘flac’, ‘m4a’. Note: only ‘wav’ may be retrieved without waiting for ‘ready’ status. Returns: The URL that can be used to download the content of the audio associated with the requested take. Return type: str async get_takes(take_ids: list[str] | None = None, length: int | None = None, page: int | None = None, older: int | None = None, newer: int | None = None) → list[TakeResponse] Get a list of takes, optionally filtered. Parameters: * take_ids – A list of specific takes to retrieve. * length – Maximum number of voices to return. Default: unlimited. * page – Return page “page” of length “length”. Default: 1. * older – Find voices older than or equal to this timestamp (milliseconds). * newer – Find voices newer than or equal to this timestamp (milliseconds). Returns: Information about each take found. Empty list if none found. Return type: list[TakeResponse] async get_voice(voice_id: str) → VoiceInfo Get information about a voice. Parameters: voice_id – The unique identififer for a voice. Returns: Information about the voice. Return type: VoiceInfo async get_voices(length: int | None = None, page: int | None = None, older: int | None = None, newer: int | None = None) → list[VoiceInfo] Get a list of voices, optionally filtered. Parameters: * length – Maximum number of voices to return. Default: unlimited. * page – Return page “page” of length “length”. Default: 1. * older – Find voices older than or equal to this timestamp (milliseconds). * newer – Find voices newer than or equal to this timestamp (milliseconds). Returns: Information about each voice found. Return type: list[VoiceInfo] async login(email: str | None = None, password: str | None = None) → bool Log in to the Daisys API using the provided credentials. If successful, nothing is returned. An access token is stored in the client for use in future requests. May raise: * DaisysCredentialsError: if insufficient credentials are provided. * httpx.HTTPCStatusError(401): if credentials do not successfully authenticate. Parameters: * email – User name for the Daisys API credentials. * password – Password for the Daisys API credentials. async login_refresh() → bool | None Refresh access and refresh tokens for API authorization. This function does not normally need to be called explicitly, since the authorization credentials shall be renewed automatically when needed, however it is provided in case there is a need to do so explicitly by the user. * httpx.HTTPStatusError(401): if credentials do not successfully authenticate. Returns: True if successful, False if unsuccessful, and None if no refresh token was available. Return type: Optional[bool] async logout(refresh_token: str | None = None) → bool Log out of the Daisys API. Revokes the refresh token and and forgets the access and refresh tokens. * httpx.HTTPStatusError(401): if credentials do not successfully authenticate. Note that further requests may auto-login again. Returns: True if logout was successful, False if no tokens were provided to revoke. Return type: bool stream_take_audio(take_id: str) Stream the audio by providing an iterator over chunks of bytes. Parameters: take_id – A take_id to retrieve the audio URL for. Returns: use “for” to read chunks of bytes for this take. Return type: iterator async update_voice(voice_id: str, name: str | None = None, gender: VoiceGender | None = None, description: str | None = None, default_style: list[str] | None = None, default_prosody: SimpleProsody | AffectProsody | SignalProsody | None = None, raise_on_error: bool = True, **_kwargs) → bool Update a voice. The voice will no longer appear in return values from get_voices. Parameters: * voice_id – the id of a voice to update. * name – A name to give the voice, may be any string, and does not need to be unique. * gender – The gender of this voice. * description – The description of this voice. * default_style – An optional list of styles to associate with this voice by default. It can be overriden by a take that uses this voice. Note that most styles are mutually exclusive, and not all models support styles. * default_prosody – An optional default prosody to associate with this voice. It can be overridden by a take that uses this voice. * raise_on_error – If True (default) a DaisysVoiceUpdateException error will be raised if the voice was not found. (That is, if the function would have returned False.) Returns: True if the voice was updated successfully, otherwise False. Return type: bool Note that HTTP exceptions may be thrown for errors other than a voice not being found. async version() → Version Get the version information for the API in use. Returns: An object containing version information. Return type: Version async wait_for_takes(take_ids: str | list[str], sleep_seconds=0.5, callback: Callable[[TakeResponse | list[TakeResponse]], None] | None = None, async_callback: Callable[[TakeResponse | list[TakeResponse]], Awaitable[None]] | None = None, raise_on_error: bool = True, timeout: float | None = None) → TakeResponse | list[TakeResponse] Wait for a take or list of takes to be ready. Parameters: * take_ids – Either a single take_id, or a list of take_id to wait for at the same time. In the latter case, the function will return when all take_id are done. * sleep_seconds – The number of seconds to wait while polling the take status. * callback – A synchronous function to call whenever the status of one of the takes changes. The argument it receives corresponds to a list of all takes requested. (A single take will also be embedded in a list.) * async_callback – An asynchronous function to call whenever the status of one of the takes changes. The argument it receives corresponds to a list of all takes requested. * raise_on_error – If True (default) a DaisysTakeGeenerateException error will be raised if an error status is detected in one of the takes. If this behavior is not desired, set to False. * timeout – Time limit to wait, in seconds. Note that if timeout is specified, some results may not have a “done” status (ready or error). async wait_for_voices(voice_ids: str | list[str], sleep_seconds: float = 0.5, raise_on_error: bool = True, timeout: float | None = None) → VoiceInfo | list[VoiceInfo] Wait for a voice or list of voices to be ready. Parameters: * voice_ids – Either a single voice_id, or a list of voice_id to wait for at the same time. In the latter case, the function will return when all voice_id are done. * sleep_seconds – The number of seconds to wait while polling the voice status. * raise_on_error – If True (default) a DaisysVoiceGeenerateException error will be raised if an error status is detected in one of the voices. If this behavior is not desired, set to False. * timeout – Time limit to wait, in seconds. Note that if timeout is specified, some results may not have a “done” status (ready or error). websocket(model: str | None = None, voice_id: str | None = None) → DaisysAsyncSpeakWebsocketV1 Get an interface to the websocket that manages the connection, allows making voice generate and take generate reqeusts, and handles streaming the resulting audio. This provided interface is intended to be used in an async with clause. Parameters: * model – a websocket connection requires specifying a model or voice * voice_id – if model is not provided, voice_id must be provided Returns: DaisysAsyncSpeakWebsocketV1 async websocket_url(model: str | None = None, voice_id: str | None = None, raise_on_error: bool = True) → str Get a URL for connecting a websocket. Must specify model or voice_id in order to indicate the principle model to be used on this connection. Parameters: * model – the model for which we want to retrieve a websocket URL. * voice_id – the id of a voice for which we want to retrieve a websocket URL. * raise_on_error – If True (default) an error will be raised if the voice was not found or the URL could not be retrieved. Returns: The URL to connect a websocket to. Return type: str Daisys API JSON models Pydantic classes representing the JSON interface for the Daisys API. class daisys.v1.speak.models.AffectProsody(*, pitch: int, pace: int, valence: int, dominance: int, arousal: int) Prosody features based on analysis of affect. See also parent class ProsodyFeatures for other fields. valence The valence; -10 for negativity, 10 for positivity, 0 for neutral. Type: int arousal The arousal; -10 for unexcited, 10 for very excited, 0 for neutral. Type: int dominance The dominance; -10 for docile, 10 for commanding, 0 for neutral. Type: int model_config: ClassVar[ConfigDict] = {} Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. class daisys.v1.speak.models.ProsodyFeatures(*, pitch: int, pace: int) Base prosody features supported by all models. pitch The normalized pitch; -10 to 10, where 0 is a neutral pitch. Type: int pace The normalized pace; -10 to 10, where 0 is a neutral pace. Type: int model_config: ClassVar[ConfigDict] = {} Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. daisys.v1.speak.models.ProsodyFeaturesUnion A union type representing different prosody feature variations. alias of SimpleProsody | AffectProsody | SignalProsody class daisys.v1.speak.models.ProsodyType(value) An enum representing different prosody feature types. Not all models accept all prosody types. See the prosody_types field of TTSModel. SIMPLE corresponds with SimpleProsody AFFECT corresponds with AffectProsody SIGNAL corresponds with SignalProsody static from_class(prosody: SimpleProsody | AffectProsody | SignalProsody) Return an enum value based on the prosody class provided. Parameters: prosody – The prosody object from which to derive the enum value. prosody(**kwargs) Return a prosody object corresponding to this value, initialized with the given arguments. class daisys.v1.speak.models.SignalProsody(*, pitch: int, pace: int, tilt: int, pitch_range: int) Prosody features based on signal analysis. See also parent class ProsodyFeatures for other fields. tilt The normalized spectral tilt; -10 for flat, 10 for bright, 0 for neutral. Type: int pitch_range The normalized pitch range; -10 for flat, 10 for highly varied pitch, 0 for neutral. Type: int model_config: ClassVar[ConfigDict] = {} Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. class daisys.v1.speak.models.SimpleProsody(*, pitch: int, pace: int, expression: int) Simplified prosody features, supported by all models. See also parent class ProsodyFeatures for other fields. expression The normalized “expression”; -10 to 10, where 0 is neutral. Type: int model_config: ClassVar[ConfigDict] = {} Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. class daisys.v1.speak.models.Status(value) Represents the status of a take or voice generation process. WAITING Item is waiting to be processed. STARTED Processing has started for this item. PROGRESS_25 Item has been 25% processed. PROGRESS_50 Item has been 50% procesesd. PROGRESS_75 Item has been 75% procesesd. READY Item is ready to be used; for takes, audio is available. ERROR An error occurred during processing of this item. TIMEOUT Processing did not finish for this item. Note that TIMEOUT is used for very long intervals; it does not indicate a few seconds or minutes, but rather that an item has been in the queue for more than a day and has therefore been removed. It should only be considered to represent circumstances where processing errors were not detected by normal means. class daisys.v1.speak.models.StreamMode(value) Whether a websocket messages should contain a whole part or chunks of parts. Note: upper case in Python, lower case in JSON. Values: PARTS, CHUNKS class daisys.v1.speak.models.StreamOptions(*, mode: StreamMode = StreamMode.PARTS) Options for streaming. mode The streaming mode to use. Type: daisys.v1.speak.models.StreamMode model_config: ClassVar[ConfigDict] = {} Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. class daisys.v1.speak.models.TTSModel(*, name: str, displayname: str, flags: list[str] = [], languages: list[str], genders: list[VoiceGender], styles: list[list[str]] = [], prosody_types: list[ProsodyType], voice_inputs: list[VoiceInputType] | None) Information about a speech model. name The unique identifier of this model. Type: str displayname A friendlier name that might contain spaces. Type: str flags A list of flags that indicate some features of this model. Type: list[str] languages A list of languages supported by this model. Type: list[str] genders A list of genders supported by this model. Type: list[daisys.v1.speak.models.VoiceGender] styles A list of style sets; each sublist is a list of mutually exlusive style tags. Type: list[list[str]] prosody_types A list of which prosody types are supported by this model. Type: list[daisys.v1.speak.models.ProsodyType] voice_inputs A list of which voice input types are supported by this model. Type: list[daisys.v1.speak.models.VoiceInputType] | None model_config: ClassVar[ConfigDict] = {} Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. class daisys.v1.speak.models.TakeGenerate(*, text: str, override_language: str | None = None, style: list[str] | None = None, prosody: SimpleProsody | AffectProsody | SignalProsody | None = None, status_webhook: Webhook | None = None, done_webhook: Webhook | None = None, voice_id: str) Parameters necessary to generate a “take”, an audio file containing an utterance of the given text by the given voice. See TakeGenerateWithoutVoice for documentation on the remaining fields. voice_id The id of the voice to be used for generating audio. The voice is attached to a specific model. Type: str model_config: ClassVar[ConfigDict] = {} Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. class daisys.v1.speak.models.TakeGenerateWithoutVoice(*, text: str, override_language: str | None = None, style: list[str] | None = None, prosody: SimpleProsody | AffectProsody | SignalProsody | None = None, status_webhook: Webhook | None = None, done_webhook: Webhook | None = None) Parameters necessary to generate a “take”, an audio file containing an utterance of the given text. No voice is provided here, for the purpose of embedding in VoiceGenerate for the voice example. text The text that the voice should say. Type: str override_language Normally a language classifier is used to detect the language of the speech; this allows for multilingual sentences. However, if the language should be enforced, it should be provided here. Currently accepted values are “nl-NL” and “en-GB”. Type: str | None style A list of styles to enable when speaking. Note that most styles are mutually exclusive, so a list of 1 value should be provided. Accepted styles can be retrieved from the associated voice’s VoiceInfo.styles or the model’s TTSModel.styles field. Note that not all models support styles, thus this can be left empty if specific styles are not desired. Type: list[str] | None prosody The characteristics of the desired speech not determined by the voice or style. Here you can provide a SimpleProsody or most models also accept the more detailed AffectProsody. Type: daisys.v1.speak.models.SimpleProsody | daisys.v1.speak.models.AffectProsody | daisys.v1.speak.models.SignalProsody | None status_webhook An optional URL to be called using POST whenever the take’s status changes, with TakeResponse in the body content. Type: daisys.v1.speak.models.Webhook | None done_webhook An optional URL to be called exactly once using POST when the take is READY, ERROR, or TIMEOUT, with TakeResponse in the body content. Type: daisys.v1.speak.models.Webhook | None model_config: ClassVar[ConfigDict] = {} Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. class daisys.v1.speak.models.TakeInfo(*, duration: int, audio_rate: int, normalized_text: list[str]) Some information available when a take is READY, attached to the TakeResponse. duration The length of the audio in samples. To get the length in seconds, divide by audio_rate. Type: int audio_rate The number of samples per second in the audio. Type: int normalized_text The text used for text-to-speech after normalization, ie. translated from “as written” to “as spoken”. Provided as a list of sentences. Type: list[str] model_config: ClassVar[ConfigDict] = {} Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. class daisys.v1.speak.models.TakeResponse(*, text: str, override_language: str | None = None, style: list[str] | None = None, prosody: SimpleProsody | AffectProsody | SignalProsody | None = None, status_webhook: Webhook | None = None, done_webhook: Webhook | None = None, voice_id: str, take_id: str, status: Status, timestamp_ms: int, info: TakeInfo | None = None) Information about a take, returned during and after take generation. Also includes fields from TakeGenerate. take_id The unique identifier of this take. Type: str status The status of this take, whether it is ready, in error, or in progress. Type: daisys.v1.speak.models.Status timestamp_ms The timestamp that this take generation was requested, in milliseconds since epoch. Type: int info Information available when the take is READY, see TakeInfo. Type: daisys.v1.speak.models.TakeInfo | None model_config: ClassVar[ConfigDict] = {} Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. class daisys.v1.speak.models.Version(*, version: int, minor: int) Represents the version of the API. version The major version number of the API. Type: int minor The minor version number of the API. Type: int model_config: ClassVar[ConfigDict] = {} Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. class daisys.v1.speak.models.VoiceGender(value) Represents the gender of a voice. Note: upper case in Python, lower case in JSON. Values: MALE, FEMALE, NONBINARY class daisys.v1.speak.models.VoiceGenerate(*, name: str, model: str, gender: VoiceGender, description: str | None = None, default_style: list[str] | None = None, default_prosody: SimpleProsody | AffectProsody | SignalProsody | None = None, example_take: TakeGenerateWithoutVoice | None = None, done_webhook: Webhook | None = None) Parameters necessary to generate a voice. name A name to give the voice, may be any string, and does not need to be unique. Type: str model The name of the model for this voice. Refers to the name entry in TTSModel. Type: str gender The gender of this voice. Type: daisys.v1.speak.models.VoiceGender description A description of this voice. Type: str | None default_style An optional list of styles to associate with this voice by default. It can be overriden by a take that uses this voice. Note that most styles are mutually exclusive, and not all models support styles. Type: list[str] | None default_prosody An optional default prosody to associate with this voice. It can be overridden by a take that uses this voice. Type: daisys.v1.speak.models.SimpleProsody | daisys.v1.speak.models.AffectProsody | daisys.v1.speak.models.SignalProsody | None example_take Parameters for an example take to generate for this voice. If not provided, a default example text will be used, depending on the language of the model. Type: daisys.v1.speak.models.TakeGenerateWithoutVoice | None done_webhook An optional URL to call using POST when the voice is available, with the response of VoiceInfo in the body content. This shall be called once, after the voice and example take have been generated. Type: daisys.v1.speak.models.Webhook | None model_config: ClassVar[ConfigDict] = {} Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. class daisys.v1.speak.models.VoiceInfo(*, name: str, model: str, gender: VoiceGender, description: str | None = None, default_style: list[str] | None = None, default_prosody: SimpleProsody | AffectProsody | SignalProsody | None = None, example_take: TakeGenerateWithoutVoice | None = None, done_webhook: Webhook | None = None, voice_id: str, status: Status, timestamp_ms: int, example_take_id: str | None = None) Information about a voice. voice_id The unique identifier of this voice. Type: str status The status of this voice, whether it is ready, in error, or in progress. Type: daisys.v1.speak.models.Status timestamp_ms The timestamp that this voice generation was requested, in milliseconds since epoch. Type: int example_take_id An optional identifier for a take that represents an example of this voice. Type: str | None model_config: ClassVar[ConfigDict] = {} Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. class daisys.v1.speak.models.VoiceUpdate(*, name: str | None = None, gender: VoiceGender | None = None, default_style: list[str] | None = None, default_prosody: SimpleProsody | AffectProsody | SignalProsody | None = None) Update parameters of a voice. name A name to give the voice, may be any string, and does not need to be unique. Type: str | None gender The gender of this voice. Type: daisys.v1.speak.models.VoiceGender | None default_style An optional list of styles to associate with this voice by default. It can be overriden by a take that uses this voice. Note that most styles are mutually exclusive, and not all models support styles. Type: list[str] | None default_prosody An optional default prosody to associate with this voice. It can be overridden by a take that uses this voice. Type: daisys.v1.speak.models.SimpleProsody | daisys.v1.speak.models.AffectProsody | daisys.v1.speak.models.SignalProsody | None model_config: ClassVar[ConfigDict] = {} Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. class daisys.v1.speak.models.Webhook(*, post_url: str, timestamp_ms: int | None = None, status_code: int | None = None) Store information about a registered webhook and its status. When specifying a webhook, only url needs to be provided. post_url The URL to be called with POST. Type: str timestamp_ms The time it was last called at, milliseconds since epoch. Type: int | None status_code The HTTP status code of the last response from the webhook. Type: int | None model_config: ClassVar[ConfigDict] = {} Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. Daisys API websockets The Daisys API provides a websocket interface to enable direct communication with a single inference worker node, for applications that require lower latency. Latency vs. throughput While the websocket connection provides some convenience for certain applications, it should not be used for tasks where a batch approach is more appropriate, since generation requests through the REST API get distributed over multiple workers and will overall finish faster. However, applications that require real-time or near real-time interaction may benefit from keeping a connection open and receiving the response immediately without making an extra HTTP GET request. Please do keep in mind that an effect of dedicating a connection is that requests over this connection are effectively serialized, so the decision of using websocket vs. the REST API is a typical latency vs. throughput tradeoff to make. For this reason, to help guarantee latency, the websocket system reserves the right to occasionally drop the connection, forcing the client to request a new URL, which has the effect of rebalancing the distribution of connections to workers, helping to ensure lower latency overall. In this document we describe: * Connecting: How to get a websocket URL and make and maintain a connection. * Message format: The message format used to send commands and receive responses. * Python interface: How to use this Python library to communicate over the websocket. Examples of using the websocket connection from the Python API as well as communicating with the websocket from JavaScript in a browser application are given in Daisys API websocket examples. Connecting In order to connect to a Daisys worker node, you must first be assigned a node through the API. In Python, this is taken care of for you, see Python interface below. However, when using another language or curl, you can get a URL via the Websocket Endpoints using a GET request. As mentioned there, the websocket may disconnect between requests for rebalancing worker load, although this should not happen frequently. Additionally it is assumed that a websocket connection is for interaction with a specific model, which must be included in the URL. In fact any model can be used on a websocket connection but only the specified model shall be kept from being unloaded, therefore if latency is at issue, it is recommended to open a websocket connection per model. A reconnection scheme can be used to immediately request a new worker URL and reconnect if the connection is dropped. The Daisys API shall make every effort to ensure that all current requests are handled and results delivered before dropping any connections. Message format In cases where you are not using Python and wish to develop your own client for the websocket, the format is kept rather simple and should be quite approachable for any language for which a websocket library is available. Websocket supports text and bytes (binary) messages. Commands are sent using text messages, and status messages (text) and audio messages (bytes) are received. Both outgoing and incoming text messages are in JSON format. Outgoing messages have the following format: {"command": "", "data": {}, "request_id": } where command may be one of /takes/generate or /voices/generate. The data field corresponds to the same POST body given to the corresponding commands, i.e. the TakeGenerate and VoiceGenerate structures, respectively. Likewise, the status messages received for each correspond to the responses to those same commands, these being TakeGenerate and TakeGenerate, respectively. They are similarly bundled into a response structure, {"data": {}, "request_id": } Special to the websocket connection is request_id, which is needed to track which incoming responses go with which outgoing requests. Because websockets do not guarantee message order (shorter messages may arrive before longer messages), and because a take_id is not known until the first status message is received, there is no way to know which audio goes with which request. Therefore the request_id is a user-provided identifier, a string or an integer, which is included with the responses to that command. A simple incrementing integer per connection is recommended, and is what the Python interface implements. Audio response messages are also simple, however since it is necessary to carry some metadata, they contain two sections, delimited by a length prefix. Audio messages (bytes) are formatted thus: JSONRIFF.. That is, they start with the literal string JSON followed by a 32-bit little endian integer indicating how long the metadata section is. The metadata section can be converted to a string and parsed as JSON. This is immediately followed by a .wav file header, which always starts with the literal string RIFF. Therefore, starting at R, the rest of the bytes can be passed to an audio player or a wav file parsing routine if chunking is not used. The metadata section consists of the following fields, {"take_id": "", "part_id": int, "chunk_id": int, "request_id": } where part_id and chunk_id are incrementing integers as specified in the next section, and request_id reflects whatever was provided when the associated command was issued. Parts and chunks If multiple sentences have been provided, then they are returned with separate part_id values, which are an incrementing integer, where each part consists of a complete wav file. The end of the stream for a take is indicated by a new part_id that has 0 bytes of audio. If chunking is enabled, the bytes must be concatenated to an existing part stream, either in real time or before writing the part to a file. Chunks are different from parts in that they are not prepended with a wav header, but are merely the individual pieces of a part that is not yet fully received. Similar to parts, chunks are identified with an incrementing integer chunk_id which must be used to put them in order before playback. Also similar, the end of the chunk stream for a part is indicated by a new chunk_id being accompanied with 0 bytes of audio. Finally then, a stream of parts without chunking appears like so: [part_id=0, audio len=12340] [part_id=1, audio len=23450] [part_id=2, audio len=0] and with chunking, [part_id=0, chunk_id=0, audio len=4140] [part_id=0, chunk_id=1, audio len=4140] [part_id=0, chunk_id=2, audio len=0] [part_id=1, chunk_id=0, audio len=4140] [part_id=1, chunk_id=1, audio len=4140] [part_id=1, chunk_id=2, audio len=0] [part_id=2, chunk_id=0, audio len=0] The above is for visual explanation only, in reality the take_id and request_id are also included in the metadata header in order to know which audio is for which stream. I a /voices/generate message was requested, audio of the associated example take will be sent. However the status message will be a TakeGenerate object, and the take_id included in the audio messages will correspond with its example_take_id field. Python interface The Python interface consists of calling, websocket() of the client object (see Daisys API clients), in a with context manager, which returns respectively one of the following objects. For example, Streaming audio, websocket method from daisys import DaisysAPI with DaisysAPI('speak', email='user@example.com', password='pw') as speak: with speak.websocket(model='theatrical-v2') as ws: .... request_id = ws.generate_take(... In each case, you can then issue a command to generate a take or a voice using the returned context object, as demonstrated above. Subsequently the callbacks you provide get called whenever messages are received on the websocket containing either status information or audio data. class daisys.v1.speak.sync_websocket.DaisysSyncSpeakWebsocketV1(client: DaisysSyncSpeakClientV1, model: str) Wrapper for Daisys v1 API websocket connection, synchronous version. This class is intended to be used in a with clause. disconnect() Disconnect this websocket. generate_take(voice_id: str, text: str, override_language: str | None = None, style: list[str] | None = None, prosody: SimpleProsody | AffectProsody | SignalProsody | None = None, stream_options: StreamOptions | None = None, status_webhook: str | None = None, done_webhook: str | None = None, status_callback: Callable[[int, TakeResponse], None] | None = None, audio_callback: Callable[[int, str, int, int | None, bytes | None], None] | None = None, timeout: float | None = None) → int Generate a “take”, an audio file containing an utterance of the given text by the given voice. Parameters: * voice_id – The id of the voice to be used for generating audio. The voice is attached to a specific model. * text – The text that the voice should say. * override_language – Normally a language classifier is used to detect the language of the speech; this allows for multilingual sentences. However, if the language should be enforced, it should be provided here. Currently accepted values are “nl-NL” and “en-GB”. * style – A list of styles to enable when speaking. Note that most styles are mutually exclusive, so a list of 1 value should be provided. Accepted styles can be retrieved from the associated voice’s VoiceInfo.styles or the model’s TTSModel.styles field. Note that not all models support styles, thus this can be left empty if specific styles are not desired. * prosody – The characteristics of the desired speech not determined by the voice or style. Here you can provide a SimpleProsody or most models also accept the more detailed AffectProsody. * stream_options – Configuration for streaming. * status_webhook – An optional URL to be called using POST whenever the take’s status changes, with TakeResponse in the body content. * done_webhook – An optional URL to be called exactly once using POST when the take is READY, ERROR, or TIMEOUT, with TakeResponse in the body content. * status_callback – An optional function to call for status updates regarding this take. * audio_callback – An optional function to call to provide the audio parts of the take. Returns: Information about the take being generated, including status. Return type: TakeResponse generate_voice(name: str, model: str, gender: VoiceGender, description: str | None = None, default_style: list[str] | None = None, default_prosody: SimpleProsody | AffectProsody | SignalProsody | None = None, example_take: TakeGenerateWithoutVoice | None = None, stream_options: StreamOptions | None = None, done_webhook: str | None = None, status_callback: Callable[[int, TakeResponse], None] | None = None, audio_callback: Callable[[int, str, int, int | None, bytes | None], None] | None = None) → int Generate a random, novel voice for a given model with desired properties. Parameters: * name – A name to give the voice, may be any string, and does not need to be unique. * model – The name of the model for this voice. * gender – The gender of this voice. * description – The description of this voice. * default_style – An optional list of styles to associate with this voice by default. It can be overriden by a take that uses this voice. Note that most styles are mutually exclusive, and not all models support styles. * default_prosody – An optional default prosody to associate with this voice. It can be overridden by a take that uses this voice. * example_take – Information on the take to generate as an example of this voice. * stream_options – Configuration for streaming. * done_webhook – An optional URL to call exactly once using POST when the voice is available, with VoiceInfo in the body content. * status_callback – An optional function to call for status updates regarding this take. * audio_callback – An optional function to call to provide the audio parts of the take. Returns: Information about the generated voice. Return type: VoiceInfo iter_request(request_id) Iterate over incoming text and audio messages for a given request_id. Parameters: request_id – The id value associated with the request to be iterated over. Returned by take_generate() and voice_generate(). Returns: An Iterator yielding tuples (take_id, take, header, audio), where: * take_id: the take_id associated with this request * take: the TakeResponse information if a text message, otherwise None * header: the wav header if any, otherwise None * audio: the audio bytes, if a binary message, otherwise None reconnect() Reconnect this websocket, by first fetching the URL and then opening the conneciton to it. update(timeout: int | None = 1) Retrieve a waiting message on the open websocket connection. Parameters: timeout – Number of seconds to wait. Can be 0 if non-blocking usage is desired. If None, wait forever. class daisys.v1.speak.async_websocket.DaisysAsyncSpeakWebsocketV1(client: DaisysAsyncSpeakClientV1, model: str | None, voice_id: str | None) Wrapper for Daisys v1 API websocket connection, asynchronous version. This class is intended to be used in an async with clause. async disconnect() Disconnect this websocket. async generate_take(voice_id: str, text: str, override_language: str | None = None, style: list[str] | None = None, prosody: SimpleProsody | AffectProsody | SignalProsody | None = None, stream_options: StreamOptions | None = None, status_webhook: str | None = None, done_webhook: str | None = None, status_callback: Callable[[int, TakeResponse], None] | None = None, audio_callback: Callable[[int, str, int, int | None, bytes | None], None] | None = None, timeout: float | None = None) → int Generate a “take”, an audio file containing an utterance of the given text by the given voice. Parameters: * voice_id – The id of the voice to be used for generating audio. The voice is attached to a specific model. * text – The text that the voice should say. * override_language – Normally a language classifier is used to detect the language of the speech; this allows for multilingual sentences. However, if the language should be enforced, it should be provided here. Currently accepted values are “nl-NL” and “en-GB”. * style – A list of styles to enable when speaking. Note that most styles are mutually exclusive, so a list of 1 value should be provided. Accepted styles can be retrieved from the associated voice’s VoiceInfo.styles or the model’s TTSModel.styles field. Note that not all models support styles, thus this can be left empty if specific styles are not desired. * prosody – The characteristics of the desired speech not determined by the voice or style. Here you can provide a SimpleProsody or most models also accept the more detailed AffectProsody. * stream_options – Configuration for streaming. * status_webhook – An optional URL to be called using POST whenever the take’s status changes, with TakeResponse in the body content. * done_webhook – An optional URL to be called exactly once using POST when the take is READY, ERROR, or TIMEOUT, with TakeResponse in the body content. * status_callback – An optional function to call for status updates regarding this take. * audio_callback – An optional function to call to provide the audio parts of the take. Returns: Information about the take being generated, including status. Return type: TakeResponse async generate_voice(name: str, model: str, gender: VoiceGender, description: str | None = None, default_style: list[str] | None = None, default_prosody: SimpleProsody | AffectProsody | SignalProsody | None = None, example_take: TakeGenerateWithoutVoice | None = None, stream_options: StreamOptions | None = None, done_webhook: str | None = None, status_callback: Callable[[int, TakeResponse], None] | None = None, audio_callback: Callable[[int, str, int, int | None, bytes | None], None] | None = None) → int Generate a random, novel voice for a given model with desired properties. Parameters: * name – A name to give the voice, may be any string, and does not need to be unique. * model – The name of the model for this voice. * gender – The gender of this voice. * description – The description of this voice. * default_style – An optional list of styles to associate with this voice by default. It can be overriden by a take that uses this voice. Note that most styles are mutually exclusive, and not all models support styles. * default_prosody – An optional default prosody to associate with this voice. It can be overridden by a take that uses this voice. * example_take – Information on the take to generate as an example of this voice. * stream_options – Configuration for streaming. * done_webhook – An optional URL to call exactly once using POST when the voice is available, with VoiceInfo in the body content. * status_callback – An optional function to call for status updates regarding this take. * audio_callback – An optional function to call to provide the audio parts of the take. Returns: Information about the generated voice. Return type: VoiceInfo async iter_request(request_id) Iterate over incoming text and audio messages for a given request_id. Parameters: request_id – The id value associated with the request to be iterated over. Returned by take_generate() and voice_generate(). Returns: An Iterator yielding tuples (take_id, take, header, audio), where: * take_id: the take_id associated with this request * take: the TakeResponse information if a text message, otherwise None * header: the wav header if any, otherwise None * audio: the audio bytes, if a binary message, otherwise None async reconnect() Reconnect this websocket, by first fetching the URL and then opening the conneciton to it. async update(timeout: int | None = 1) Retrieve a waiting message on the open websocket connection. Parameters: timeout – Number of seconds to wait. In the async implementation this cannot be 0. If None, wait forever. Daisys API endpoints While Daisys recommends the use of the Python client, the Daisys API endpoints are available for use with other languages. In addition to the current document, the FastAPI-generated documentation is available: * Swagger UI: https://api.daisys.ai/v1/speak/docs * Redoc: https://api.daisys.ai/v1/speak/redoc * OpenAPI definition file: https://api.daisys.ai/v1/speak/openapi.json See also the FastAPI documentation on how to generate clients for other languages. The “Speak” API provides a REST interface to its three main data structures: models, voices, and takes. This is best demonstrated in the curl example, where JSON objects are constructed as strings in a shell script. See JSON input structures for more information on JSON input. Model-related Endpoints A Daisys API user account may have access to one or more models. These models can be listed by accessing the models endpoint using a GET request: https://api.daisys.ai/v1/speak/models Furthermore a specific model can be accessed by providing its name: https://api.daisys.ai/v1/speak/models/ Voice-related Endpoints A Daisys API user account may have access to one or more voices. These voices can be listed by accessing the voices endpoint using a GET request. Voice listing can also be filtered, by providing fields voice_id (a comma-separated list of voice_id to retrieve), length, page, older and newer. In these case a list is returned: https://api.daisys.ai/v1/speak/voices https://api.daisys.ai/v1/speak/voices?voice_id= https://api.daisys.ai/v1/speak/voices?length=5&page=2 https://api.daisys.ai/v1/speak/voices?newer=1690214050638 The argument for newer and older must be a timestamp in milliseconds since epoch. This can be retrieved in Python for example using: import time def seconds_ago(seconds: int=2): return int((time.time() - seconds)*1000) response = requests.get(f'https://api.daisys.ai/v1/speak/voices?newer={seconds_ago(2)}', header={'Authorization', 'Bearer ' + access_token) Furthermore a specific voice can be accessed by providing its name. In this case a single item is returned instead of a list: https://api.daisys.ai/v1/speak/voices/ User accounts may also have access to generate new voices. This can be done by making a POST request to the voices/generate endpoint: https://api.daisys.ai/v1/speak/voices/generate The body should contain the VoiceGenerate structure in JSON format. Example: curl -X POST -H "Authorization: Bearer $TOKEN" -H 'content-type: application/json' \ -d '{"name": "Bob", "gender": "male", "model": "my_model"}' \ https://api.daisys.ai/v1/speak/voices/generate where my_model should be the name of a model listed by the /speak/models endpoint. Take-related Endpoints The principle service of the Daisys API is to perform text-to-speech audio synthesis. This is done by generating “takes”, which encapsulate a TTS job. Previously generated takes can be retrieved via takes, and the list can be filtered similar to voices: https://api.daisys.ai/v1/speak/takes https://api.daisys.ai/v1/speak/takes?take_id= https://api.daisys.ai/v1/speak/takes?length=5&page=2 https://api.daisys.ai/v1/speak/takes?newer=1690214050638 with similar semantics to /speak/voices described above. A single take can be retrieved by giving its identifier: https://api.daisys.ai/v1/speak/takes/ An audio take can be generated by making a POST request to takes/generate: https://api.daisys.ai/v1/speak/takes/generate and providing the TakeGenerate structure as input in the content body. Finally, the audio can be retrieved by accessing the take’s /wav endpoint. Equivalently, other formats can also be retrieved this way, however wav is the only format that can be retrieved before it is “ready”, allowing to download as it is generated: https://api.daisys.ai/v1/speak/takes//wav https://api.daisys.ai/v1/speak/takes//mp3 https://api.daisys.ai/v1/speak/takes//m4a https://api.daisys.ai/v1/speak/takes//flac https://api.daisys.ai/v1/speak/takes//webm Note that these endpoints return a 307 redirect to where the audio can be streamed or stored from. Important: a complication is that S3 presigned URLs must be accessed without the Daisys “Authorization” header, which some http clients will not drop automatically. Therefore the following logic is recommended, and performed by the Python client library when following the redirect to url: if 'X-Amz-Signature' in url: # Pre-signed URL, no auth needed. headers = {} Note that browsers handle this automatically when changing origins, however it is not recommended in any case to access the REST API endpoints directly from the browser since they require the access token. Instead, backend software can access the /wav endpoint and retrieve the URL in the Location header, and forward this to the browser, which can be access without the Authorization header and has a limited lifetime. Therefore this redirect Location is convenient and more secure to pass directly to an Audio Player object on the client side. Retrieving audio Finally, the audio can be retrieved by accessing the take’s /wav endpoint. Equivalently, other formats can also be retrieved this way, however wav is the only format that can be retrieved before it is “ready”, allowing to download as it is generated: https://api.daisys.ai/v1/speak/takes//wav https://api.daisys.ai/v1/speak/takes//mp3 https://api.daisys.ai/v1/speak/takes//m4a https://api.daisys.ai/v1/speak/takes//flac https://api.daisys.ai/v1/speak/takes//webm Note that these endpoints return a 307 redirect to where the audio can be streamed or stored from. Important: a complication is that S3 presigned URLs must be accessed without the Daisys “Authorization” header, which some http clients will not drop automatically. Therefore the following logic is recommended, and performed by the Python client library when following the redirect to url: if 'X-Amz-Signature' in url: # Pre-signed URL, no auth needed. headers = {} Note that browsers handle this automatically when changing origins, however it is not recommended in any case to access the REST API endpoints directly from the browser since they require the access token. Instead, backend software can access the /wav endpoint and retrieve the URL in the Location header, and forward this to the browser, which can be accessed without the Authorization header and has a limited lifetime. Therefore this redirect Location is convenient and more secure to pass directly to an Audio Player object on the client side. Websocket Endpoints The following endpoint can be used to retrieve an URL for making a direct websocket connection to a worker by issuing a GET request: https://api.daisys.ai/v1/speak/websocket?model= As can be seen, the model to use must be specified when making a request for a worker URL, which allows the Daisys API to better distribute requests to workers with preloaded models. For the same reason, whenever a websocket is disconnected, a new URL must be requested through the above endpoint. Disconnection may happen from time to time but shall not happen during the processing of a request. The provided URLs expire after 1 hour. A connection may remain open longer than that, but new connections must request a new URL. The endpoint returns the following JSON body: { "websocket_url": "" } Authentication Endpoints To make use of the Daisys API, first an access token must be granted. This can be retrieved by a POST request to the auth/login endpoint: https://api.daisys.ai/auth/login The content body should have the form: { "email": , "password": } On failure, a 401 HTTP status is returned. (In the client library, an exception is raised.) On success, a JSON object containing access_token and refresh_token fields is provided. The access_token string should be attached to all GET and POST requests in the HTTP header, in the following form: Authorization: Bearer Furthermore if the access_token is no longer working, the refresh_token can be used to get a new one without supplying the password: https://api.daisys.ai/auth/refresh In this case the POST request should have the form: { "email": , "refresh_token": } The response contains new access_token and refresh_token fields. This allows to continually refresh an initial token whenever needed, so that the API can be used without providing a password. Note that this token refresh logic is taken care of automatically by the Python client library. The client can also be initiated with just an email and refresh token rather than an email and password, so that credentials need not be provided to the Daisys API client. It is also alternatively possible to request a permatoken, which does not need to be refreshed. On the other hand, refresh tokens can be revoked at any time through the following POST endpoint: https://api.daisys.ai/auth/logout with content body of the form: { "refresh_token": , } JSON input structures POST endpoints, namely takes/generate and voices/generate, take input in their content body in the form of JSON objects. The structure of all such objects can be inferred by reading the models, since the fields can be translated directly to JSON. Nonetheless some of the embedded structures and optional fields can be confusing, thus we give some examples here. A minimal example of TakeGenerate: { "text": "This is some text to speak.", "prosody": {"pace": -3, "pitch": 0, "expression": 4}, "voice_id": "01h3anwqdh1q6zhf9s9s239wky", } Optional fields such as style, override_language, and done_webhook can be added as desired. Here is an example of TakeGenerate using all available fields: { "text": "This is some text to speak.", "override_language": "en-GB", "prosody": {"pace": -3, "pitch": 0, "expression": 4}, "voice_id": "01h3anwqdh1q6zhf9s9s239wky", "style": ["narrator"], "status_webhook": "https://myservice.com/daisys_webhooks/take_status/1234", "done_webhook": "https://myservice.com/daisys_webhooks/take_done/1234", } Note that override_language is provided here as an example, but if it is not provided (is null) then the Daisys API will attempt to pronounce words in the correct language on a per-word basis. If it is provided, then the model may for example mispronounce loan words, since it assumes a single language for the input text. The presence of the style field depends on the model in use, as does the supported prosody types, although all models support the simple prosody type with pace, pitch, and expression being integer values from -10 to 10. Specific information about the model can be retrieved by the /speak/models endpoint. Finally, here is an example of input for voices/generate: { "name": "Bob", "default_prosody": {"pace": 0, "pitch": 0, "expression": 0}, "model": "eng_base", "gender": "male", "done_webhook": "https://myservice.com/daisys_webhooks/voice_done/1234", } Here, a default prosody is specified for the voice, which is adopted in subsequent /take/generate requests if prosody is not provided (left as null). Indices and tables * Index * Module Index * Search Page daisys-api Navigation Contents: * Getting started * Daisys API examples * Daisys API input * Daisys API top-level object * Daisys API clients * Daisys API JSON models * Daisys API websockets * Daisys API endpoints Related Topics * Documentation overview ©2025, Daisys AI. | Powered by Sphinx 8.1.3 & Alabaster 1.0.0