.. _v1_speak_endpoints: Daisys API endpoints ==================== While Daisys recommends the use of the Python client, the Daisys API endpoints are available for use with other languages. In addition to the current document, the FastAPI-generated documentation is available: - Swagger UI: https://api.daisys.ai/v1/speak/docs - Redoc: https://api.daisys.ai/v1/speak/redoc - OpenAPI definition file: https://api.daisys.ai/v1/speak/openapi.json See also the FastAPI documentation on how to `generate clients`_ for other languages. .. _generate clients: https://fastapi.tiangolo.com/advanced/generate-clients The "Speak" API provides a REST interface to its three main data structures: models, voices, and takes. This is best demonstrated in the :ref:`curl example `, where JSON objects are constructed as strings in a shell script. See `JSON input structures`_ for more information on JSON input. .. _v1_speak_model_endpoints: Model-related Endpoints ----------------------- A Daisys API user account may have access to one or more *models*. These models can be listed by accessing the ``models`` endpoint using a ``GET`` request:: https://api.daisys.ai/v1/speak/models Furthermore a specific model can be accessed by providing its name:: https://api.daisys.ai/v1/speak/models/ .. _v1_speak_voice_endpoints: Voice-related Endpoints ----------------------- A Daisys API user account may have access to one or more *voices*. These voices can be listed by accessing the ``voices`` endpoint using a ``GET`` request. Voice listing can also be filtered, by providing fields ``voice_id`` (a comma-separated list of ``voice_id`` to retrieve), ``length``, ``page``, ``older`` and ``newer``. In these case a list is returned:: https://api.daisys.ai/v1/speak/voices https://api.daisys.ai/v1/speak/voices?voice_id= https://api.daisys.ai/v1/speak/voices?length=5&page=2 https://api.daisys.ai/v1/speak/voices?newer=1690214050638 The argument for ``newer`` and ``older`` must be a timestamp in milliseconds since epoch. This can be retrieved in Python for example using:: import time def seconds_ago(seconds: int=2): return int((time.time() - seconds)*1000) response = requests.get(f'https://api.daisys.ai/v1/speak/voices?newer={seconds_ago(2)}', header={'Authorization', 'Bearer ' + access_token) Furthermore a specific voice can be accessed by providing its name. In this case a single item is returned instead of a list:: https://api.daisys.ai/v1/speak/voices/ User accounts may also have access to generate new voices. This can be done by making a ``POST`` request to the ``voices/generate`` endpoint:: https://api.daisys.ai/v1/speak/voices/generate The body should contain the :class:`VoiceGenerate ` structure in JSON format. Example:: curl -X POST -H "Authorization: Bearer $TOKEN" -H 'content-type: application/json' \ -d '{"name": "Bob", "gender": "male", "model": "my_model"}' \ https://api.daisys.ai/v1/speak/voices/generate where ``my_model`` should be the name of a model listed by the ``/speak/models`` endpoint. .. _v1_speak_take_endpoints: Take-related Endpoints ---------------------- The principle service of the Daisys API is to perform text-to-speech audio synthesis. This is done by generating "takes", which encapsulate a TTS job. Previously generated takes can be retrieved via ``takes``, and the list can be filtered similar to voices:: https://api.daisys.ai/v1/speak/takes https://api.daisys.ai/v1/speak/takes?take_id= https://api.daisys.ai/v1/speak/takes?length=5&page=2 https://api.daisys.ai/v1/speak/takes?newer=1690214050638 with similar semantics to ``/speak/voices`` described above. A single take can be retrieved by giving its identifier:: https://api.daisys.ai/v1/speak/takes/ An audio take can be generated by making a ``POST`` request to ``takes/generate``:: https://api.daisys.ai/v1/speak/takes/generate and providing the :class:`TakeGenerate ` structure as input in the content body. Finally, the audio can be retrieved by accessing the take's ``/wav`` endpoint. Equivalently, other formats can also be retrieved this way, however ``wav`` is the only format that can be retrieved before it is "ready", allowing to download as it is generated:: https://api.daisys.ai/v1/speak/takes//wav https://api.daisys.ai/v1/speak/takes//mp3 https://api.daisys.ai/v1/speak/takes//m4a https://api.daisys.ai/v1/speak/takes//flac https://api.daisys.ai/v1/speak/takes//webm Note that these endpoints return a 307 redirect to where the audio can be streamed or stored from. Important: a complication is that S3 presigned URLs must be accessed without the Daisys "Authorization" header, which some http clients will not drop automatically. Therefore the following logic is recommended, and performed by the Python client library when following the redirect to ``url``:: if 'X-Amz-Signature' in url: # Pre-signed URL, no auth needed. headers = {} Note that browsers `handle this automatically`_ when changing origins, however it is not recommended in any case to access the REST API endpoints directly from the browser since they require the access token. Instead, backend software can access the ``/wav`` endpoint and retrieve the URL in the Location header, and forward this to the browser, which can be access without the Authorization header and has a limited lifetime. Therefore this redirect Location is convenient and more secure to pass directly to an Audio Player object on the client side. .. _handle this automatically: https://github.com/whatwg/fetch/pull/1544 .. _v1_speak_endpoints_retrieving_audio: Retrieving audio ................ Finally, the audio can be retrieved by accessing the take's ``/wav`` endpoint. Equivalently, other formats can also be retrieved this way, however ``wav`` is the only format that can be retrieved before it is "ready", allowing to download as it is generated:: https://api.daisys.ai/v1/speak/takes//wav https://api.daisys.ai/v1/speak/takes//mp3 https://api.daisys.ai/v1/speak/takes//m4a https://api.daisys.ai/v1/speak/takes//flac https://api.daisys.ai/v1/speak/takes//webm Note that these endpoints return a 307 redirect to where the audio can be streamed or stored from. Important: a complication is that S3 presigned URLs must be accessed without the Daisys "Authorization" header, which some http clients will not drop automatically. Therefore the following logic is recommended, and performed by the Python client library when following the redirect to ``url``:: if 'X-Amz-Signature' in url: # Pre-signed URL, no auth needed. headers = {} Note that browsers `handle this automatically`_ when changing origins, however it is not recommended in any case to access the REST API endpoints directly from the browser since they require the access token. Instead, backend software can access the ``/wav`` endpoint and retrieve the URL in the Location header, and forward this to the browser, which can be accessed without the Authorization header and has a limited lifetime. Therefore this redirect Location is convenient and more secure to pass directly to an Audio Player object on the client side. .. _handle this automatically: https://github.com/whatwg/fetch/pull/1544 .. _websocket_endpoint: Websocket Endpoints ------------------- The following endpoint can be used to retrieve an URL for making a direct websocket connection to a worker by issuing a GET request:: https://api.daisys.ai/v1/speak/websocket?model= As can be seen, the model to use must be specified when making a request for a worker URL, which allows the Daisys API to better distribute requests to workers with preloaded models. For the same reason, whenever a websocket is disconnected, a new URL must be requested through the above endpoint. Disconnection may happen from time to time but shall not happen during the processing of a request. The provided URLs expire after 1 hour. A connection may remain open longer than that, but new connections must request a new URL. The endpoint returns the following JSON body:: { "websocket_url": "" } Authentication Endpoints ------------------------ To make use of the Daisys API, first an access token must be granted. This can be retrieved by a ``POST`` request to the ``auth/login`` endpoint:: https://api.daisys.ai/auth/login The content body should have the form:: { "email": , "password": } On failure, a 401 HTTP status is returned. (In the client library, an exception is raised.) On success, a JSON object containing ``access_token`` and ``refresh_token`` fields is provided. The ``access_token`` string should be attached to all ``GET`` and ``POST`` requests in the HTTP header, in the following form:: Authorization: Bearer Furthermore if the ``access_token`` is no longer working, the ``refresh_token`` can be used to get a new one without supplying the password:: https://api.daisys.ai/auth/refresh In this case the ``POST`` request should have the form:: { "email": , "refresh_token": } The response contains new ``access_token`` and ``refresh_token`` fields. This allows to continually refresh an initial token whenever needed, so that the API can be used without providing a password. Note that this token refresh logic is taken care of automatically by the Python client library. The client can also be initiated with just an email and refresh token rather than an email and password, so that credentials need not be provided to the Daisys API client. It is also alternatively possible to request a permatoken, which does not need to be refreshed. On the other hand, refresh tokens can be revoked at any time through the following ``POST`` endpoint: https://api.daisys.ai/auth/logout with content body of the form:: { "refresh_token": , } JSON input structures --------------------- ``POST`` endpoints, namely ``takes/generate`` and ``voices/generate``, take input in their content body in the form of JSON objects. The structure of all such objects can be inferred by reading the :ref:`models `, since the fields can be translated directly to JSON. Nonetheless some of the embedded structures and optional fields can be confusing, thus we give some examples here. A minimal example of :class:`TakeGenerate `:: { "text": "This is some text to speak.", "prosody": {"pace": -3, "pitch": 0, "expression": 4}, "voice_id": "01h3anwqdh1q6zhf9s9s239wky", } Optional fields such as ``style``, ``override_language``, and ``done_webhook`` can be added as desired. Here is an example of :class:`TakeGenerate ` using all available fields:: { "text": "This is some text to speak.", "override_language": "en-GB", "prosody": {"pace": -3, "pitch": 0, "expression": 4}, "voice_id": "01h3anwqdh1q6zhf9s9s239wky", "style": ["narrator"], "status_webhook": "https://myservice.com/daisys_webhooks/take_status/1234", "done_webhook": "https://myservice.com/daisys_webhooks/take_done/1234", } Note that ``override_language`` is provided here as an example, but if it is not provided (is ``null``) then the Daisys API will attempt to pronounce words in the correct language on a per-word basis. If it is provided, then the model may for example mispronounce loan words, since it assumes a single language for the input text. The presence of the ``style`` field depends on the model in use, as does the supported prosody types, although all models support the simple prosody type with ``pace``, ``pitch``, and ``expression`` being integer values from -10 to 10. Specific information about the model can be retrieved by the ``/speak/models`` endpoint. Finally, here is an example of input for ``voices/generate``:: { "name": "Bob", "default_prosody": {"pace": 0, "pitch": 0, "expression": 0}, "model": "eng_base", "gender": "male", "done_webhook": "https://myservice.com/daisys_webhooks/voice_done/1234", } Here, a default prosody is specified for the voice, which is adopted in subsequent ``/take/generate`` requests if ``prosody`` is not provided (left as ``null``).