Getting started with the Python client library ============================================== For more on using the Python client library, see :doc:`examples`. In the following, the default "synchronous" client will be demonstrated. Some users will prefer to use ``asyncio``, and in the following examples, ``with DaisysAPI()`` can be replaced with ``async with DaisysAPI``, which returns an asynchronous client library that can be used with the ``await`` keyword. Installing the library ...................... The library is available on pypi.org and can be installed via ``pip``. The Daisys API requires Python version 3.10 or greater. First create a Python ``venv``, activate it, install the library, and then download and run the examples: .. code-block:: shell :caption: Installing the library :linenos: $ # setup a virtual environment $ mkdir daisys_project $ cd daisys_project $ python3 -m venv venv $ . venv/bin/activate $ $ # install the library $ python3 -m pip daisys $ $ # or if Python websocket support is needed $ python3 -m pip daisys[ws] Of course ``pip`` is only one option, you can use any Python project management software such as ``uv``, Poetry, etc. Running an example .................. Within the Python virtual environment, the ``hello_daisys.py`` example can be run. The examples are programmed to take your email and password in the environment variables as shown: .. code-block:: shell :caption: Running an example :linenos: $ curl -O https://raw.githubusercontent.com/daisys-ai/daisys-api-python/main/examples/hello_daisys.py $ export DAISYS_EMAIL=user@example.com $ export DAISYS_PASSWORD=example_password123 $ python3 hello_daisys.py Getting a client ................ .. code-block:: python :caption: Getting a client by context manager :linenos: from daisys import DaisysAPI with DaisysAPI('speak', email='user@example.com', password='pw') as speak: ... # or for asyncio support: async with DaisysAPI('speak', email='user@example.com', password='pw') as speak: ... As mentioned, an ``asyncio``-enabled client can be instantiated by using ``async with`` in the line above. Additionally, the context manager interface (``with``) is optional; it is also possible to create a client by normal function call: .. code-block:: python :caption: Getting a client by function :linenos: from daisys import DaisysAPI speak = DaisysAPI('speak', email=EMAIL, password=PASSWORD).get_client() # or.. speak = DaisysAPI('speak', email=EMAIL, password=PASSWORD).get_async_client() The main difference is that when an email and password are used, the context manager approach will automatically log out when the program exits the context, whereas when the client is retrieved by ``get_client`` or ``get_async_client``, then ``.logout()`` function should be called. Logging out invalidates the refresh token so that no further sessions can be renewed without logging in again. Auto-logout will not occur when an access token is provided. The rest of this documentation will assume the normal, synchronous client. In all cases, functions should be called with ``await`` when used with the ``asyncio`` client. Listing the models .................. Using the client library, it is easy to log into the API and start requesting text to speech services. The following Python code can be used to list the available models: .. code-block:: python :caption: Listing the models :linenos: from daisys import DaisysAPI with DaisysAPI('speak', email='user@example.com', password='pw') as speak: print('Found models:') for model in speak.get_models(): print(model) Listing the voices .................. You can use a model by using a voice associated with that model. Voices are identified by a ``voice_id`` field. .. code-block:: python :caption: Listing the voices :linenos: from daisys import DaisysAPI with DaisysAPI('speak', email='user@example.com', password='pw') as speak: print('Found voices:') for voice in speak.get_voices(): print(f'{voice.name}, a {voice.gender} voice of {voice.model} with id {voice.voice_id}.') Generating a voice .................. If you do not yet have any voices, you should generate one. Voices can be requested for a given gender and with default prosody information. Voices must be given names. For instance, the following block of code creates an expressive female voice for the ``shakespeare`` model: .. code-block:: python :caption: Generating a voice :linenos: from daisys import DaisysAPI, VoiceGender from pprint import pprint with DaisysAPI('speak', email='user@example.com', password='pw') as speak: print('Creating a voice:') voice = speak.generate_voice(name="Deirdre", gender=VoiceGender.FEMALE, model="shakespeare") pprint(voice.model_dump()) Note that voice generation can take a few seconds! In this example, the ``speak.generate_voice`` command `waits` for the operation to finish, and therefore we can print the result immediately. It is also possible to adopt a more asynchronous style by providing ``wait=False`` to ``speak.generate_voice()``. Alternatively, as mentioned above you can use the ``asyncio`` client to allow the ``await speak.generate_voice()`` syntax. The above code gives the following details: .. code-block:: text :caption: Generating a voice: output :linenos: Creating a voice: {'default_style': [], 'default_prosody': None, 'done_webhook': None, 'example_take': None, 'example_take_id': 't01hasgezqkx4vth62xckymk3x3', 'gender': , 'model': 'shakespeare', 'name': 'Deirdre', 'status': , 'timestamp_ms': 1695218371261, 'voice_id': 'v01hasgezqjcsnc91zdfzpx0apj'} We can see that the voice has a female gender, and has an example take associated with it. This ``take_id`` can already be used to hear the voice. Generating a take ................. Now that you have a voice, text to speech can be requested by the ``speak.take_generate()`` command: .. code-block:: python :caption: Generating a take :linenos: from daisys import DaisysAPI from pprint import pprint with DaisysAPI('speak', email='user@example.com', password='pw') as speak: print('Creating a take:') take = speak.generate_take(voice_id='v01hasgezqjcsnc91zdfzpx0apj', text="Hello, Daisys! It's a beautiful day.") pprint(take.model_dump()) Giving, .. code-block:: text :caption: Generating a take: output :linenos: Creating a take: {'done_webhook': None, 'info': {'audio_rate': 44100, 'duration': 152576, 'normalized_text': ['Hello, Daisys!', "It's a beautiful day."]}, 'override_language': None, 'prosody': None, 'status': , 'status_webhook': None, 'style': None, 'take_id': 't01hasgn2dnyg6jqrcym9cgxv75', 'text': "Hello, Daisys! It's a beautiful day.", 'timestamp_ms': 1695220926901, 'voice_id': 'v01hasgezqjcsnc91zdfzpx0apj'} Note that the status is "ready", meaning that audio can now be retrieved. As with voice generation, an asynchronous approach is also available for ``generate_take``. Retrieving a take's audio ......................... The take is ready, now we can hear the result! Audio for a take can be retrieved as follows: .. code-block:: python :caption: Retrieving audio (1) :linenos: from daisys import DaisysAPI with DaisysAPI('speak', email='user@example.com', password='pw') as speak: print("Getting a take's audio.") audio_wav = speak.get_take_audio(take_id='t01hasghx0zgdc29gpzexw5r8wc', file='beautiful_day.wav') print('Length in bytes:', len(audio_wav)) In the above code, we retrive a ``.wav`` file, which is (optiionally) written to a file in addition to being returned. This can be decoded for example using ``scipy``'s ``io.wavfile`` module: .. code-block:: python :caption: Retrieving audio (2) :linenos: from scipy.io import wavfile from io import BytesIO print(wavfile.read(BytesIO(audio_wav))) # Note: Since decoding the audio is outside the scope of the client library, # `scipy` is not a dependency and will not be automatically installed by `pip`. which, along with the previous code block, prints: .. code-block:: text :caption: Retrieving audio: output :linenos: Getting a take's audio. Length in bytes: 292908 (44100, array([-111, -46, -104, ..., -128, -95, -9], dtype=int16)) The resulting file ``beautiful_day.wav`` can be played using command line programs like ``aplay`` on Linux, or any audio player such as the excellent `VLC`_. You can integrate the results into your creative projects! It is also possible to retrieve the audio in other formats: ``mp3``, ``flac``, and ``m4a`` by providing the ``format`` parameter. .. _VLC: https://www.videolan.org/ Streaming audio ............... The Daisys API supports two methods of streaming audio: * HTTP * Websocket HTTP ^^^^ The HTTP method downloads the audio file in chunks using a streaming response, and can be convenient if a simple iterator interface is desired. When making the take request, set ``wait`` to ``False``, and call :meth:`~daisys.v1.speak.sync_client.DaisysSyncSpeakClientV1.stream_take_audio` (``async`` :meth:`~daisys.v1.speak.async_client.DaisysAsyncSpeakClientV1.stream_take_audio`). Alternatively a signed URL can be retrieved using :meth:`~daisys.v1.speak.sync_client.DaisysSyncSpeakClientV1.get_take_audio_url` (``async`` :meth:`~daisys.v1.speak.async_client.DaisysAsyncSpeakClientV1.get_take_audio_url`), useful for passing to an audio playing running on a frontend browser. .. code-block:: python :caption: Streaming audio, HTTP method :linenos: from daisys import DaisysAPI with DaisysAPI('speak', email='user@example.com', password='pw') as speak: print("Streaming a take's audio.") with speak.stream_take_audio(take_id='t01hasghx0zgdc29gpzexw5r8wc') as stream: for chunk in stream: print('Length in bytes:', len(chunk)) When using the HTTP method via endpoints outside of the Python library, please be aware of the use of 307 redirects and headers, outlined in :ref:`v1_speak_endpoints_retrieving_audio`. Websocket ^^^^^^^^^ See :ref:`websocket_examples`. For lowest latency usage, it is additionally possible to use a websocket to create a connection directly to the worker node used for synthesizing audio. Requests are submitted to the worker and the same node streams back the audio as it is generated over the already-established connection. .. code-block:: python :caption: Streaming audio, websocket method :linenos: from daisys import DaisysAPI with DaisysAPI('speak', email='user@example.com', password='pw') as speak: print("Streaming a take's audio.") with speak.websocket(voice_id='v01hasgezqjcsnc91zdfzpx0apj') as ws: request_id = ws.generate_take(voice_id='v01hasgezqjcsnc91zdfzpx0apj', text="Hello, Daisys! It's a beautiful day.", audio_callback=my_audio_cb, status_callback=my_status_cb) The specified callbacks will be called whenever the requested take's status changes or audio data is generated. See :ref:`websocket_example` for complete information on the signatures of these two callbacks and examples showing how they can be used to receive audio in chunks as it is generated. In addition to the callback interface, :meth:`~daisys.v1.speak.sync_websocket.DaisysSyncSpeakWebsocketV1.iter_request` (``async`` :meth:`~daisys.v1.speak.async_websocket.DaisysAsyncSpeakWebsocketV1.iter_request`) is provided to allow an iterator-based for-loop (or async for-loop) over incoming audio chunks, simplifying usage. Finally, in applications where the backend should perform REST API calls but the front-end should stream audio, :meth:`~daisys.v1.speak.sync_client.DaisysSyncSpeakClientV1.websocket_url` can be used to retrieve a URL that the front-end should connect a websocket to. :ref:`websocket_client` is provided to show how to manage the websocket connection using JavaScript. Authentication with access tokens ................................. All the above examples authenticate with the API using email and password. In some scenarios users will prefer to authenticate using only the access token. An access and refresh token can be retrieved once and used until it is manually revoked. By default, when the client library is used with email and password, the refresh token is automatically revoked when the client context is exited. When an access token is provided to the client context, this automatic revocation is skipped, so that the token can be refreshed on next usage. This can be controlled by setting ``speak.auto_logout`` to ``True`` or ``False``. To retrieve an access and refresh token for future use, the following program can thus be used: .. code-block:: python :caption: Retrieving an access and refresh token :linenos: from daisys import DaisysAPI with DaisysAPI('speak', email='user@example.com', password='pw') as speak: speak.auto_logout = False speak.login() access_token, refresh_token = speak.access_token, speak.refresh_token These tokens can now be stored, and provided to the client as follows: .. code-block:: python :caption: Retrieving an access and refresh token :linenos: from daisys import DaisysAPI def store_tokens(speak, access_token: str, refresh_token: str): """Store the current Daisys access and refresh tokens.""" with open('daisys_tokens.json','w') as token_file: json.dump([access_token, refresh_token], token_file) access_token, refresh_token = json.load(open('daisys_tokens.json')) with DaisysAPI('speak', access_token=access_token, refresh_token=refresh_token) as speak: speak.token_callback = store_tokens ... The library does *not* implement a storage and retrieval mechanism for these tokens, as it is presumed that users will have their own files or databases for this purpose. Importantly, when an access token expires, a new one will be automatically retrieved by the library. Therefore, it is useful to store ``speak.access_token`` and ``speak.refresh_token`` whenever it changes. The ``token_callback`` is provided for this purpose. It is optional, but recommended if not using a permatoken and one wishes to avoid transmitting passwords.