Daisys API endpoints

While Daisys recommends the use of the Python client, the Daisys API endpoints are available for use with other languages. In addition to the current document, the FastAPI-generated documentation is available:

See also the FastAPI documentation on how to generate clients for other languages.

The “Speak” API provides a REST interface to its three main data structures: models, voices, and takes.

This is best demonstrated in the curl example, where JSON objects are constructed as strings in a shell script. See JSON input structures for more information on JSON input.

Websocket Endpoints

The following endpoint can be used to retrieve an URL for making a direct websocket connection to a worker by issuing a GET request:

https://api.daisys.ai/v1/speak/websocket?model=<model>

As can be seen, the model to use must be specified when making a request for a worker URL, which allows the Daisys API to better distribute requests to workers with preloaded models.

For the same reason, whenever a websocket is disconnected, a new URL must be requested through the above endpoint. Disconnection may happen from time to time but shall not happen during the processing of a request. The provided URLs expire after 1 hour. A connection may remain open longer than that, but new connections must request a new URL.

The endpoint returns the following JSON body:

{
  "websocket_url": "<url>"
}

Authentication Endpoints

To make use of the Daisys API, first an access token must be granted. This can be retrieved by a POST request to the auth/login endpoint:

https://api.daisys.ai/auth/login

The content body should have the form:

{
  "email": <user@example.com>,
  "password": <password>
}

On failure, a 401 HTTP status is returned. (In the client library, an exception is raised.) On success, a JSON object containing access_token and refresh_token fields is provided.

The access_token string should be attached to all GET and POST requests in the HTTP header, in the following form:

Authorization: Bearer <access_token>

Furthermore if the access_token is no longer working, the refresh_token can be used to get a new one without supplying the password:

https://api.daisys.ai/auth/refresh

In this case the POST request should have the form:

{
  "email": <user@example.com>,
  "refresh_token": <refresh_token>
}

The response contains new access_token and refresh_token fields. This allows to continually refresh an initial token whenever needed, so that the API can be used without providing a password.

Note that this token refresh logic is taken care of automatically by the Python client library. The client can also be initiated with just an email and refresh token rather than an email and password, so that credentials need not be provided to the Daisys API client. It is also alternatively possible to request a permatoken, which does not need to be refreshed.

On the other hand, refresh tokens can be revoked at any time through the following POST endpoint:

with content body of the form:

{
  "refresh_token": <refresh_token>,
}

JSON input structures

POST endpoints, namely takes/generate and voices/generate, take input in their content body in the form of JSON objects.

The structure of all such objects can be inferred by reading the models, since the fields can be translated directly to JSON. Nonetheless some of the embedded structures and optional fields can be confusing, thus we give some examples here.

A minimal example of TakeGenerate:

{
  "text": "This is some text to speak.",
  "prosody": {"pace": -3, "pitch": 0, "expression": 4},
  "voice_id": "01h3anwqdh1q6zhf9s9s239wky",
}

Optional fields such as style, override_language, and done_webhook can be added as desired.

Here is an example of TakeGenerate using all available fields:

{
  "text": "This is some text to speak.",
  "override_language": "en-GB",
  "prosody": {"pace": -3, "pitch": 0, "expression": 4},
  "voice_id": "01h3anwqdh1q6zhf9s9s239wky",
  "style": ["narrator"],
  "status_webhook": "https://myservice.com/daisys_webhooks/take_status/1234",
  "done_webhook": "https://myservice.com/daisys_webhooks/take_done/1234",
}

Note that override_language is provided here as an example, but if it is not provided (is null) then the Daisys API will attempt to pronounce words in the correct language on a per-word basis. If it is provided, then the model may for example mispronounce loan words, since it assumes a single language for the input text. The presence of the style field depends on the model in use, as does the supported prosody types, although all models support the simple prosody type with pace, pitch, and expression being integer values from -10 to 10. Specific information about the model can be retrieved by the /speak/models endpoint.

Finally, here is an example of input for voices/generate:

{
  "name": "Bob",
  "default_prosody": {"pace": 0, "pitch": 0, "expression": 0},
  "model": "eng_base",
  "gender": "male",
  "done_webhook": "https://myservice.com/daisys_webhooks/voice_done/1234",
}

Here, a default prosody is specified for the voice, which is adopted in subsequent /take/generate requests if prosody is not provided (left as null).