Daisys API endpoints¶
While Daisys recommends the use of the Python client, the Daisys API endpoints are available for use with other languages. In addition to the current document, the FastAPI-generated documentation is available:
Swagger UI: https://api.daisys.ai/v1/speak/docs
OpenAPI definition file: https://api.daisys.ai/v1/speak/openapi.json
See also the FastAPI documentation on how to generate clients for other languages.
The “Speak” API provides a REST interface to its three main data structures: models, voices, and takes.
This is best demonstrated in the curl example, where JSON objects are constructed as strings in a shell script. See JSON input structures for more information on JSON input.
Websocket Endpoints¶
The following endpoint can be used to retrieve an URL for making a direct websocket connection to a worker by issuing a GET request:
https://api.daisys.ai/v1/speak/websocket?model=<model>
As can be seen, the model to use must be specified when making a request for a worker URL, which allows the Daisys API to better distribute requests to workers with preloaded models.
For the same reason, whenever a websocket is disconnected, a new URL must be requested through the above endpoint. Disconnection may happen from time to time but shall not happen during the processing of a request. The provided URLs expire after 1 hour. A connection may remain open longer than that, but new connections must request a new URL.
The endpoint returns the following JSON body:
{
"websocket_url": "<url>"
}
Authentication Endpoints¶
To make use of the Daisys API, first an access token must be granted. This can be
retrieved by a POST
request to the auth/login
endpoint:
https://api.daisys.ai/auth/login
The content body should have the form:
{
"email": <user@example.com>,
"password": <password>
}
On failure, a 401 HTTP status is returned. (In the client library, an exception is
raised.) On success, a JSON object containing access_token
and refresh_token
fields is provided.
The access_token
string should be attached to all GET
and POST
requests in the
HTTP header, in the following form:
Authorization: Bearer <access_token>
Furthermore if the access_token
is no longer working, the refresh_token
can be
used to get a new one without supplying the password:
https://api.daisys.ai/auth/refresh
In this case the POST
request should have the form:
{
"email": <user@example.com>,
"refresh_token": <refresh_token>
}
The response contains new access_token
and refresh_token
fields. This allows to
continually refresh an initial token whenever needed, so that the API can be used without
providing a password.
Note that this token refresh logic is taken care of automatically by the Python client library. The client can also be initiated with just an email and refresh token rather than an email and password, so that credentials need not be provided to the Daisys API client. It is also alternatively possible to request a permatoken, which does not need to be refreshed.
On the other hand, refresh tokens can be revoked at any time through the following
POST
endpoint:
with content body of the form:
{
"refresh_token": <refresh_token>,
}
JSON input structures¶
POST
endpoints, namely takes/generate
and voices/generate
, take input in their
content body in the form of JSON objects.
The structure of all such objects can be inferred by reading the models, since the fields can be translated directly to JSON. Nonetheless some of the embedded structures and optional fields can be confusing, thus we give some examples here.
A minimal example of TakeGenerate
:
{
"text": "This is some text to speak.",
"prosody": {"pace": -3, "pitch": 0, "expression": 4},
"voice_id": "01h3anwqdh1q6zhf9s9s239wky",
}
Optional fields such as style
, override_language
, and done_webhook
can be
added as desired.
Here is an example of TakeGenerate
using
all available fields:
{
"text": "This is some text to speak.",
"override_language": "en-GB",
"prosody": {"pace": -3, "pitch": 0, "expression": 4},
"voice_id": "01h3anwqdh1q6zhf9s9s239wky",
"style": ["narrator"],
"status_webhook": "https://myservice.com/daisys_webhooks/take_status/1234",
"done_webhook": "https://myservice.com/daisys_webhooks/take_done/1234",
}
Note that override_language
is provided here as an example, but if it is not provided
(is null
) then the Daisys API will attempt to pronounce words in the correct language
on a per-word basis. If it is provided, then the model may for example mispronounce loan
words, since it assumes a single language for the input text. The presence of the
style
field depends on the model in use, as does the supported prosody types, although
all models support the simple prosody type with pace
, pitch
, and expression
being integer values from -10 to 10. Specific information about the model can be
retrieved by the /speak/models
endpoint.
Finally, here is an example of input for voices/generate
:
{
"name": "Bob",
"default_prosody": {"pace": 0, "pitch": 0, "expression": 0},
"model": "eng_base",
"gender": "male",
"done_webhook": "https://myservice.com/daisys_webhooks/voice_done/1234",
}
Here, a default prosody is specified for the voice, which is adopted in subsequent
/take/generate
requests if prosody
is not provided (left as null
).