Getting started with the command line¶

The Daisys API can be used from the command line using curl and jq. Most application writers will want to use this guide to see how to make HTTP calls to the API for developing their own client libraries in their favorite language.

Running the `curl` example¶

The Python client library source code bundles an example of how to use the API this way. Instructions to run that example are provided on the linked page.

The rest of this document shall describe how to use the API one step at a time in a shell, rather than in a shell script. In the examples, the result of curl is piped to jq . for formatting purposes.

Authenticating¶

To access the Daisys Speak API, you must attach an access token to any HTTP calls, with the exception of the /version endpoint.

To get such an access key, it can be requested by providing an email and password as follows:

Authenticating: Getting an access token¶

TOKENS=$(curl -s -X POST -H 'content-type: application/json' \
         -d '{"email": "user@example.com", "password": "my_password123"}' \
         https://api.daisys.ai/auth/login)
export ACCESS_TOKEN=$(echo $TOKENS | jq -r .access_token)
export REFRESH_TOKEN=$(echo $TOKENS | jq -r .refresh_token)

You can keep using this access token for a limited time. It can be used by adding it into the string Bearer $ACCESS_TOKEN for the value of the Authorization header.

If you receive a 401 response from any API request, the access token needs to be refreshed by issuing:

Authenticating: Refreshing the access token¶

$ TOKENS=$(curl -s -X POST -H 'content-type: application/json' \
           -H "Authorization: Bearer $ACCESS_TOKEN" \
           -d '{"refresh_token": "'$REFRESH_TOKEN'"}' \
           https://api.daisys.ai/auth/refresh)
$ export ACCESS_TOKEN=$(echo $TOKENS | jq -r .access_token)
$ export REFRESH_TOKEN=$(echo $TOKENS | jq -r .refresh_token)

Listing the models¶

Models can be listed by accessing the /models endpoint. More information on the options are found in Model-related Endpoints.

Listing the models¶

$ curl -s -X GET -H "Authorization: Bearer $ACCESS_TOKEN" https://api.daisys.ai/v1/speak/models | jq .
[
  {
    "name": "shakespeare",
    "displayname": "Shakespeare",
    "flags": [],
    "languages": [
      "en-GB"
    ],
    "genders": [
      "female",
      "male"
    ],
    "styles": [
      [
        "base",
        "character",
        "narrator"
      ]
    ],
    "prosody_types": [
      "simple",
      "affect"
    ]
  }
]

Listing the voices¶

Voices can be listed by accessing the /voices endpoint. More information on the options are found in Voice-related Endpoints.

Listing the voices¶

$ curl -s -X GET -H "Authorization: Bearer $ACCESS_TOKEN" https://api.daisys.ai/v1/speak/voices | jq .
[
  {
    "name": "Deirdre",
    "model": "shakespeare",
    "gender": "female",
    "default_style": [],
    "default_prosody": null,
    "example_take": null,
    "status_webhook": null,
    "done_webhook": null,
    "voice_id": "v01hasgezqjcsnc91zdfzpx0apj",
    "status": "ready",
    "timestamp_ms": 1695220727538,
    "example_take_id": "t01hasgezqkx4vth62xckymk3x3"
  }
]

Generating a voice¶

If you do not yet have any voices, you should generate one using the /voices/generate endpoint. Voices can be requested for a given gender and with default prosody information. Voices must be given names. More information on the options are found in Voice-related Endpoints.

For instance, the following command creates an expressive female voice for the shakespeare model:

Generating a voice¶

$ curl -s -X POST -H 'content-type: application/json' \
-H "Authorization: Bearer $ACCESS_TOKEN" \
-d '{"name": "Ignacio", "gender": "male", "model": "shakespeare"}' \
https://api.daisys.ai/v1/speak/voices/generate | jq .
{
  "name": "Ignacio",
  "model": "shakespeare-pause_symbol-18-4-23",
  "gender": "male",
  "default_style": null,
  "default_prosody": null,
  "example_take": null,
  "done_webhook": null,
  "voice_id": "v01haxx5cggwz215gzv0hjbra9m",
  "status": "waiting",
  "timestamp_ms": 1695368262160,
  "example_take_id": "t01haxx5cgg3n8f2qzc8zkbn97y"
}

Note that voice generation can take a few seconds! In this example, the “status” is “waiting” and not yet “ready”, therefore we should check in on it again after a second or two. For this, we need to use the voice_id provided in the response:

Checking the voice status¶

$ curl -s -X GET -H 'content-type: application/json' \
-H "Authorization: Bearer $ACCESS_TOKEN" \
https://api.daisys.ai/v1/speak/voices/v01haxx5cggwz215gzv0hjbra9m | jq .
{
  "name": "Ignacio",
  "model": "shakespeare-pause_symbol-18-4-23",
  "gender": "male",
  "default_style": null,
  "default_prosody": null,
  "example_take": null,
  "done_webhook": null,
  "voice_id": "v01haxx5cggwz215gzv0hjbra9m",
  "status": "ready",
  "timestamp_ms": 1695368262160,
  "example_take_id": "t01haxx5cgg3n8f2qzc8zkbn97y"
}

The voice is now “ready”! We can now get its example audio using the example_take_id field, see Retrieving a take’s audio below.

Note: as seen in the response structure, a webhook can also be provided to get a notification when the result is ready. This webhook is called as a POST request with the same response structure as seen here, provided in the request body.

Generating a take¶

Now that you have a voice, text to speech can be requested by the /takes/generate endpoint. Here we generate one with default prosody for the voice, which we also left as default (neutral) when generating the voice above. More information on the options are found in Take-related Endpoints.

Generating a take¶

$ curl -s -X POST -H 'content-type: application/json' \
-H "Authorization: Bearer $ACCESS_TOKEN" \
-d '{"text": "Hello, Daisys! It'\''s a beautiful day.", "voice_id": "v01hasgezqjcsnc91zdfzpx0apj"}' \
https://api.daisys.ai/v1/speak/takes/generate
{
  "text": "Hello, Daisys! It's a beautiful day.",
  "override_language": null,
  "style": null,
  "prosody": null,
  "status_webhook": null,
  "done_webhook": null,
  "voice_id": "v01hasgezqjcsnc91zdfzpx0apj",
  "take_id": "t01haybgb16dn9dk0p5je47qz74",
  "status": "waiting",
  "timestamp_ms": 1695383301158,
  "info": null
}

Similar to with voice generation, take generation takes a couple of seconds, and the status can be retrieved by using the take_id:

Generating a take: checking status¶

$ curl -s -X GET -H "Authorization: Bearer $ACCESS_TOKEN" \
https://api.daisys.ai/v1/speak/takes/t01haybgb16dn9dk0p5je47qz74 | jq .
{
  "text": "Hello, Daisys! It's a beautiful day.",
  "override_language": null,
  "style": null,
  "prosody": null,
  "status_webhook": null,
  "done_webhook": null,
  "voice_id": "v01hasgezqjcsnc91zdfzpx0apj",
  "take_id": "t01haybgb16dn9dk0p5je47qz74",
  "status": "ready",
  "timestamp_ms": 1695383301158,
  "info": {
    "duration": 150528,
    "audio_rate": 44100,
    "normalized_text": [
      "Hello, Daisys!",
      "It's a beautiful day."
    ]
  }
}

Similar to voice generation, it is possible to use a webhook for the “done” notification. For longer texts, it is also possible to request a “status” webhook which may be called several times whenever the progress for a take changes.

Here, we see the status is “ready”, meaning that audio can now be retrieved.

Retrieving a take’s audio¶

The take is ready, now we can hear the result! Audio for a take can be retrieved as follows:

Retrieving audio¶

$ curl -s -L -X GET -H "Authorization: Bearer $ACCESS_TOKEN" \
-o beautiful_day.wav \
https://api.daisys.ai/v1/speak/takes/t01haybgb16dn9dk0p5je47qz74/wav

In the above, we retrieve a .wav file and write it to disk as beautiful_day.wav. Note that the -L flag must be provided since the file is returned through a 307 redirect.

The resulting file beautiful_day.wav can be played using command line programs like aplay on Linux, or any audio player such as the excellent VLC. You can integrate the results into your creative projects!

It is also possible to retrieve the audio in other formats: mp3, flac, webm, and m4a, by retrieving at the corresponding URL, ../speak/takes/t01haybgb16dn9dk0p5je47qz74/mp3, etc.

Getting started with the command line¶

Running the curl example¶

Authenticating¶

Listing the models¶

Listing the voices¶

Generating a voice¶

Generating a take¶

Retrieving a take’s audio¶

Running the `curl` example¶