Getting started with the command line¶
The Daisys API can be used from the command line using curl
and jq
. Most
application writers will want to use this guide to see how to make HTTP calls to the API
for developing their own client libraries in their favorite language.
Running the curl
example¶
The Python client library source code bundles an example of how to use the API this way. Instructions to run that example are provided on the linked page.
The rest of this document shall describe how to use the API one step at a time in a shell,
rather than in a shell script. In the examples, the result of curl
is piped to jq .
for formatting purposes.
Authenticating¶
To access the Daisys Speak API, you must attach an access token to any HTTP calls, with
the exception of the /version
endpoint.
To get such an access key, it can be requested by providing an email and password as follows:
1TOKENS=$(curl -s -X POST -H 'content-type: application/json' \
2 -d '{"email": "user@example.com", "password": "my_password123"}' \
3 https://api.daisys.ai/auth/login)
4export ACCESS_TOKEN=$(echo $TOKENS | jq -r .access_token)
5export REFRESH_TOKEN=$(echo $TOKENS | jq -r .refresh_token)
You can keep using this access token for a limited time. It can be used by adding it into
the string Bearer $ACCESS_TOKEN
for the value of the Authorization
header.
If you receive a 401 response from any API request, the access token needs to be refreshed by issuing:
1$ TOKENS=$(curl -s -X POST -H 'content-type: application/json' \
2 -H "Authorization: Bearer $ACCESS_TOKEN" \
3 -d '{"refresh_token": "'$REFRESH_TOKEN'"}' \
4 https://api.daisys.ai/auth/refresh)
5$ export ACCESS_TOKEN=$(echo $TOKENS | jq -r .access_token)
6$ export REFRESH_TOKEN=$(echo $TOKENS | jq -r .refresh_token)
Listing the models¶
Models can be listed by accessing the /models
endpoint. More information on the
options are found in Model-related Endpoints.
1$ curl -s -X GET -H "Authorization: Bearer $ACCESS_TOKEN" https://api.daisys.ai/v1/speak/models | jq .
2[
3 {
4 "name": "shakespeare",
5 "displayname": "Shakespeare",
6 "flags": [],
7 "languages": [
8 "en-GB"
9 ],
10 "genders": [
11 "female",
12 "male"
13 ],
14 "styles": [
15 [
16 "base",
17 "character",
18 "narrator"
19 ]
20 ],
21 "prosody_types": [
22 "simple",
23 "affect"
24 ]
25 }
26]
Listing the voices¶
Voices can be listed by accessing the /voices
endpoint. More information on the
options are found in Voice-related Endpoints.
1$ curl -s -X GET -H "Authorization: Bearer $ACCESS_TOKEN" https://api.daisys.ai/v1/speak/voices | jq .
2[
3 {
4 "name": "Deirdre",
5 "model": "shakespeare",
6 "gender": "female",
7 "default_style": [],
8 "default_prosody": null,
9 "example_take": null,
10 "status_webhook": null,
11 "done_webhook": null,
12 "voice_id": "v01hasgezqjcsnc91zdfzpx0apj",
13 "status": "ready",
14 "timestamp_ms": 1695220727538,
15 "example_take_id": "t01hasgezqkx4vth62xckymk3x3"
16 }
17]
Generating a voice¶
If you do not yet have any voices, you should generate one using the /voices/generate
endpoint. Voices can be requested for a given gender and with default prosody
information. Voices must be given names. More information on the options are found in
Voice-related Endpoints.
For instance, the following command creates an expressive female voice for the
shakespeare
model:
1$ curl -s -X POST -H 'content-type: application/json' \
2-H "Authorization: Bearer $ACCESS_TOKEN" \
3-d '{"name": "Ignacio", "gender": "male", "model": "shakespeare"}' \
4https://api.daisys.ai/v1/speak/voices/generate | jq .
5{
6 "name": "Ignacio",
7 "model": "shakespeare-pause_symbol-18-4-23",
8 "gender": "male",
9 "default_style": null,
10 "default_prosody": null,
11 "example_take": null,
12 "done_webhook": null,
13 "voice_id": "v01haxx5cggwz215gzv0hjbra9m",
14 "status": "waiting",
15 "timestamp_ms": 1695368262160,
16 "example_take_id": "t01haxx5cgg3n8f2qzc8zkbn97y"
17}
Note that voice generation can take a few seconds! In this example, the “status” is
“waiting” and not yet “ready”, therefore we should check in on it again after a second or
two. For this, we need to use the voice_id
provided in the response:
1$ curl -s -X GET -H 'content-type: application/json' \
2-H "Authorization: Bearer $ACCESS_TOKEN" \
3https://api.daisys.ai/v1/speak/voices/v01haxx5cggwz215gzv0hjbra9m | jq .
4{
5 "name": "Ignacio",
6 "model": "shakespeare-pause_symbol-18-4-23",
7 "gender": "male",
8 "default_style": null,
9 "default_prosody": null,
10 "example_take": null,
11 "done_webhook": null,
12 "voice_id": "v01haxx5cggwz215gzv0hjbra9m",
13 "status": "ready",
14 "timestamp_ms": 1695368262160,
15 "example_take_id": "t01haxx5cgg3n8f2qzc8zkbn97y"
16}
The voice is now “ready”! We can now get its example audio using the example_take_id
field, see Retrieving a take’s audio below.
Note: as seen in the response structure, a webhook can also be provided to get a
notification when the result is ready. This webhook is called as a POST
request with
the same response structure as seen here, provided in the request body.
Generating a take¶
Now that you have a voice, text to speech can be requested by the /takes/generate
endpoint. Here we generate one with default prosody for the voice, which we also left as
default (neutral) when generating the voice above. More information on the options are
found in Take-related Endpoints.
1$ curl -s -X POST -H 'content-type: application/json' \
2-H "Authorization: Bearer $ACCESS_TOKEN" \
3-d '{"text": "Hello, Daisys! It'\''s a beautiful day.", "voice_id": "v01hasgezqjcsnc91zdfzpx0apj"}' \
4https://api.daisys.ai/v1/speak/takes/generate
5{
6 "text": "Hello, Daisys! It's a beautiful day.",
7 "override_language": null,
8 "style": null,
9 "prosody": null,
10 "status_webhook": null,
11 "done_webhook": null,
12 "voice_id": "v01hasgezqjcsnc91zdfzpx0apj",
13 "take_id": "t01haybgb16dn9dk0p5je47qz74",
14 "status": "waiting",
15 "timestamp_ms": 1695383301158,
16 "info": null
17}
Similar to with voice generation, take generation takes a couple of seconds, and the
status can be retrieved by using the take_id
:
1$ curl -s -X GET -H "Authorization: Bearer $ACCESS_TOKEN" \
2https://api.daisys.ai/v1/speak/takes/t01haybgb16dn9dk0p5je47qz74 | jq .
3{
4 "text": "Hello, Daisys! It's a beautiful day.",
5 "override_language": null,
6 "style": null,
7 "prosody": null,
8 "status_webhook": null,
9 "done_webhook": null,
10 "voice_id": "v01hasgezqjcsnc91zdfzpx0apj",
11 "take_id": "t01haybgb16dn9dk0p5je47qz74",
12 "status": "ready",
13 "timestamp_ms": 1695383301158,
14 "info": {
15 "duration": 150528,
16 "audio_rate": 44100,
17 "normalized_text": [
18 "Hello, Daisys!",
19 "It's a beautiful day."
20 ]
21 }
22}
Similar to voice generation, it is possible to use a webhook for the “done” notification. For longer texts, it is also possible to request a “status” webhook which may be called several times whenever the progress for a take changes.
Here, we see the status is “ready”, meaning that audio can now be retrieved.
Retrieving a take’s audio¶
The take is ready, now we can hear the result! Audio for a take can be retrieved as follows:
1$ curl -s -L -X GET -H "Authorization: Bearer $ACCESS_TOKEN" \
2-o beautiful_day.wav \
3https://api.daisys.ai/v1/speak/takes/t01haybgb16dn9dk0p5je47qz74/wav
In the above, we retrieve a .wav
file and write it to disk as beautiful_day.wav
.
Note that the -L
flag must be provided since the file is returned through a 307
redirect.
The resulting file beautiful_day.wav
can be played using command line programs like
aplay
on Linux, or any audio player such as the excellent VLC. You can integrate
the results into your creative projects!
It is also possible to retrieve the audio in other formats: mp3
, flac
,
webm
, and m4a
, by retrieving at the corresponding URL,
../speak/takes/t01haybgb16dn9dk0p5je47qz74/mp3
, etc.