Example: Websocket example, asynchronous client

This example shows:

  1. How to open a websocket connection using an async context manager.

  2. Generate a take, specifying status and audio callbacks.

  3. The signature of each of these callbacks and how to interpret their arguments.

  4. How to make a request with and without chunks enabled. (Add argument --chunks.)

Example output
$ python3 -m examples.websocket_example_async
Found Daisys Speak API version=1 minor=0
Status.WAITING
Status.STARTED
[0.751s] Received part_id=0 (chunk_id=None) for take_id='t01jqrjedpjb11hpg40h9kkydpk' with audio length 245804
appending audio 245804
Read 245804 bytes of wav data, wrote "websocket_part1.wav".
Status.PROGRESS_50
[1.183s] Received part_id=1 (chunk_id=None) for take_id='t01jqrjedpjb11hpg40h9kkydpk' with audio length 112684
appending audio 112684
Read 112684 bytes of wav data, wrote "websocket_part2.wav".
[1.184s] Received part_id=2 (chunk_id=None) for take_id='t01jqrjedpjb11hpg40h9kkydpk' with audio length (empty -- done receiving)
stream done
Status.READY
Deleting take t01jqrjedpjb11hpg40h9kkydpk: True
Example output (chunks enabled)
$ python3 -m examples.websocket_example_async --chunks
Found Daisys Speak API version=1 minor=0
Status.WAITING
Status.STARTED
[0.311s] Received part_id=0 (chunk_id=0) for take_id='t01jqrjh3yrbzpd79q1nprcrbjy' with audio length 4140
appending audio 4140
[0.324s] Received part_id=0 (chunk_id=1) for take_id='t01jqrjh3yrbzpd79q1nprcrbjy' with audio length 4096
appending audio 8236
[0.338s] Received part_id=0 (chunk_id=2) for take_id='t01jqrjh3yrbzpd79q1nprcrbjy' with audio length 4096
appending audio 12332
[0.351s] Received part_id=0 (chunk_id=3) for take_id='t01jqrjh3yrbzpd79q1nprcrbjy' with audio length 4096
appending audio 16428
[0.365s] Received part_id=0 (chunk_id=4) for take_id='t01jqrjh3yrbzpd79q1nprcrbjy' with audio length 4096
appending audio 20524
[0.378s] Received part_id=0 (chunk_id=5) for take_id='t01jqrjh3yrbzpd79q1nprcrbjy' with audio length 4096
appending audio 24620
[0.389s] Received part_id=0 (chunk_id=6) for take_id='t01jqrjh3yrbzpd79q1nprcrbjy' with audio length 4096
appending audio 28716
[0.399s] Received part_id=0 (chunk_id=7) for take_id='t01jqrjh3yrbzpd79q1nprcrbjy' with audio length 4096
appending audio 32812
[0.409s] Received part_id=0 (chunk_id=8) for take_id='t01jqrjh3yrbzpd79q1nprcrbjy' with audio length 4096
appending audio 36908
[0.419s] Received part_id=0 (chunk_id=9) for take_id='t01jqrjh3yrbzpd79q1nprcrbjy' with audio length 4096
appending audio 41004
[0.429s] Received part_id=0 (chunk_id=10) for take_id='t01jqrjh3yrbzpd79q1nprcrbjy' with audio length 4096
appending audio 45100
[0.439s] Received part_id=0 (chunk_id=11) for take_id='t01jqrjh3yrbzpd79q1nprcrbjy' with audio length 4096
appending audio 49196
[0.449s] Received part_id=0 (chunk_id=12) for take_id='t01jqrjh3yrbzpd79q1nprcrbjy' with audio length 4096
appending audio 53292
[0.459s] Received part_id=0 (chunk_id=13) for take_id='t01jqrjh3yrbzpd79q1nprcrbjy' with audio length 4096
appending audio 57388
[0.469s] Received part_id=0 (chunk_id=14) for take_id='t01jqrjh3yrbzpd79q1nprcrbjy' with audio length 4096
appending audio 61484
[0.479s] Received part_id=0 (chunk_id=15) for take_id='t01jqrjh3yrbzpd79q1nprcrbjy' with audio length 4096
appending audio 65580
[0.489s] Received part_id=0 (chunk_id=16) for take_id='t01jqrjh3yrbzpd79q1nprcrbjy' with audio length 4096
appending audio 69676
[0.500s] Received part_id=0 (chunk_id=17) for take_id='t01jqrjh3yrbzpd79q1nprcrbjy' with audio length 4096
appending audio 73772
[0.510s] Received part_id=0 (chunk_id=18) for take_id='t01jqrjh3yrbzpd79q1nprcrbjy' with audio length 4096
appending audio 77868
[0.520s] Received part_id=0 (chunk_id=19) for take_id='t01jqrjh3yrbzpd79q1nprcrbjy' with audio length 4096
appending audio 81964
[0.530s] Received part_id=0 (chunk_id=20) for take_id='t01jqrjh3yrbzpd79q1nprcrbjy' with audio length 4096
appending audio 86060
[0.540s] Received part_id=0 (chunk_id=21) for take_id='t01jqrjh3yrbzpd79q1nprcrbjy' with audio length 4096
appending audio 90156
[0.550s] Received part_id=0 (chunk_id=22) for take_id='t01jqrjh3yrbzpd79q1nprcrbjy' with audio length 4096
appending audio 94252
[0.560s] Received part_id=0 (chunk_id=23) for take_id='t01jqrjh3yrbzpd79q1nprcrbjy' with audio length 4096
appending audio 98348
[0.570s] Received part_id=0 (chunk_id=24) for take_id='t01jqrjh3yrbzpd79q1nprcrbjy' with audio length 4096
appending audio 102444
[0.580s] Received part_id=0 (chunk_id=25) for take_id='t01jqrjh3yrbzpd79q1nprcrbjy' with audio length 4096
appending audio 106540
[0.590s] Received part_id=0 (chunk_id=26) for take_id='t01jqrjh3yrbzpd79q1nprcrbjy' with audio length 4096
appending audio 110636
[0.600s] Received part_id=0 (chunk_id=27) for take_id='t01jqrjh3yrbzpd79q1nprcrbjy' with audio length 4096
appending audio 114732
[0.610s] Received part_id=0 (chunk_id=28) for take_id='t01jqrjh3yrbzpd79q1nprcrbjy' with audio length 4096
appending audio 118828
[0.620s] Received part_id=0 (chunk_id=29) for take_id='t01jqrjh3yrbzpd79q1nprcrbjy' with audio length 2048
appending audio 120876
[0.621s] Received part_id=0 (chunk_id=30) for take_id='t01jqrjh3yrbzpd79q1nprcrbjy' with audio length (empty -- done receiving)
part done
Read 120876 bytes of wav data, wrote "websocket_part1.wav".
Status.PROGRESS_50
[0.976s] Received part_id=1 (chunk_id=0) for take_id='t01jqrjh3yrbzpd79q1nprcrbjy' with audio length 4140
appending audio 4140
[0.985s] Received part_id=1 (chunk_id=1) for take_id='t01jqrjh3yrbzpd79q1nprcrbjy' with audio length 4096
appending audio 8236
[0.995s] Received part_id=1 (chunk_id=2) for take_id='t01jqrjh3yrbzpd79q1nprcrbjy' with audio length 4096
appending audio 12332
[1.004s] Received part_id=1 (chunk_id=3) for take_id='t01jqrjh3yrbzpd79q1nprcrbjy' with audio length 4096
appending audio 16428
[1.014s] Received part_id=1 (chunk_id=4) for take_id='t01jqrjh3yrbzpd79q1nprcrbjy' with audio length 4096
appending audio 20524
[1.024s] Received part_id=1 (chunk_id=5) for take_id='t01jqrjh3yrbzpd79q1nprcrbjy' with audio length 4096
appending audio 24620
[1.034s] Received part_id=1 (chunk_id=6) for take_id='t01jqrjh3yrbzpd79q1nprcrbjy' with audio length 4096
appending audio 28716
[1.044s] Received part_id=1 (chunk_id=7) for take_id='t01jqrjh3yrbzpd79q1nprcrbjy' with audio length 4096
appending audio 32812
[1.054s] Received part_id=1 (chunk_id=8) for take_id='t01jqrjh3yrbzpd79q1nprcrbjy' with audio length 4096
appending audio 36908
[1.064s] Received part_id=1 (chunk_id=9) for take_id='t01jqrjh3yrbzpd79q1nprcrbjy' with audio length 4096
appending audio 41004
[1.074s] Received part_id=1 (chunk_id=10) for take_id='t01jqrjh3yrbzpd79q1nprcrbjy' with audio length 4096
appending audio 45100
[1.084s] Received part_id=1 (chunk_id=11) for take_id='t01jqrjh3yrbzpd79q1nprcrbjy' with audio length 4096
appending audio 49196
[1.095s] Received part_id=1 (chunk_id=12) for take_id='t01jqrjh3yrbzpd79q1nprcrbjy' with audio length 4096
appending audio 53292
[1.105s] Received part_id=1 (chunk_id=13) for take_id='t01jqrjh3yrbzpd79q1nprcrbjy' with audio length 1024
appending audio 54316
[1.105s] Received part_id=1 (chunk_id=14) for take_id='t01jqrjh3yrbzpd79q1nprcrbjy' with audio length (empty -- done receiving)
part done
Read 54316 bytes of wav data, wrote "websocket_part2.wav".
[1.105s] Received part_id=2 (chunk_id=0) for take_id='t01jqrjh3yrbzpd79q1nprcrbjy' with audio length (empty -- done receiving)
stream done
Status.READY
Deleting take t01jqrjh3yrbzpd79q1nprcrbjy: True
examples/websocket_example_async.py
  1import sys, os, asyncio, time
  2from typing import Optional
  3from daisys import DaisysAPI
  4from daisys.v1.speak import (DaisysWebsocketGenerateError, HTTPStatusError, Status, TakeResponse,
  5                             StreamOptions, StreamMode)
  6
  7# Override DAISYS_EMAIL and DAISYS_PASSWORD with your details!
  8EMAIL = os.environ.get('DAISYS_EMAIL', 'user@example.com')
  9PASSWORD = os.environ.get('DAISYS_PASSWORD', 'pw')
 10
 11# Please see tokens_example.py for how to use an access token instead of a password.
 12
 13
 14async def main(chunks):
 15    async with DaisysAPI('speak', email=EMAIL, password=PASSWORD) as speak:
 16        print('Found Daisys Speak API', await speak.version())
 17
 18        # A buffer to receive parts; we initialize with a single empty bytes()
 19        # because we will use it to accumulate chunks of the current wav file
 20        # there.  In total we will end with a list of wav files, one for each
 21        # part.  Parts are bits of speech, usually full sentences, that end with
 22        # silence.
 23        audio_wavs = [bytes()]
 24
 25        # Assume at least one voice is available
 26        voice = (await speak.get_voices())[0]
 27
 28        async with speak.websocket(voice_id=voice.voice_id) as ws:
 29            # Flags we can use to only wait on our one take request; we wait
 30            # until the take is READY, and we also wait until we are done
 31            # receive all audio parts.
 32            done = False
 33            ready = False
 34
 35            # Time the latency from when we submit the request until each part
 36            # is received.
 37            t0 = time.time()
 38
 39            # The audio callback receives "parts" consisting of audio .wav files
 40            # with WAV headers on each part.  Depending on the stream settings,
 41            # the file may be divided into chunks, where chunk_id==None indicates
 42            # the last chunk of a part.  If audio==None, then no more parts will
 43            # arrive for that take_id.
 44            async def audio_cb(request_id: int, take_id: str, part_id: int,
 45                               chunk_id: int|None, audio: bytes|None):
 46                nonlocal done
 47
 48                # Report timing info and function arguments
 49                print(f'[{time.time()-t0:0.3f}s] Received {part_id=} ({chunk_id=}) for {take_id=} '
 50                      'with audio length', len(audio) if audio else '(empty -- done receiving)')
 51
 52                # We only requested one take_id; the take_id is generated by the
 53                # Daisys API, so we do not know it until the first status
 54                # message arrives.  Therefore we can check that the request_id
 55                # is the expected one.
 56                assert request_id == generate_request_id
 57                assert generated_take is None or  take_id == generated_take.take_id
 58
 59                if audio is None:
 60                    # If stream is done for this part
 61                    if chunk_id in [0, None]:
 62                        print('stream done')
 63                        # If we have any audio data, write out the last file
 64                        if len(audio_wavs[-1]) > 0:
 65                            with open(f'websocket_part{len(audio_wavs)}.wav', 'wb') as f:
 66                                f.write(audio_wavs[-1])
 67                                print(f'Read {len(audio_wavs[-1])} bytes of wav data, wrote "{f.name}".')
 68                        # Flag that we are done receiving audio
 69                        done = True
 70
 71                    # If we are receiving the last chunk of a part
 72                    elif chunk_id > 0:
 73                        print('part done')
 74                        # Write out the part
 75                        with open(f'websocket_part{len(audio_wavs)}.wav', 'wb') as f:
 76                            f.write(audio_wavs[-1])
 77                            print(f'Read {len(audio_wavs[-1])} bytes of wav data, wrote "{f.name}".')
 78
 79                        # Start a new part
 80                        audio_wavs.append(bytes())
 81
 82                # Otherwise append the chunk.
 83                else:
 84                    audio_wavs[-1] = audio_wavs[-1] + audio
 85                    print('appending audio', len(audio_wavs[-1]))
 86
 87                    # If non-chunked stream, the part is ended immediately
 88                    if chunk_id is None:
 89                        # If we have any audio data, write out the file
 90                        with open(f'websocket_part{len(audio_wavs)}.wav', 'wb') as f:
 91                            f.write(audio_wavs[-1])
 92                            print(f'Read {len(audio_wavs[-1])} bytes of wav data, wrote "{f.name}".')
 93
 94                        # Start a new part
 95                        audio_wavs.append(bytes())
 96
 97            # The status callback is called every time the take's status
 98            # changes.  Here we use it to end the update loop.
 99            async def status_cb(request_id: int, take: TakeResponse):
100                nonlocal ready, generated_take
101                assert request_id == generate_request_id
102                generated_take = take
103                print(take.status)
104                if take.status == Status.READY:
105                    ready = True
106
107            # Submit a request to generate a take over the websocket connection.
108            generate_request_id = await ws.generate_take(
109                voice_id=voice.voice_id,
110                text='Hello from Daisys websockets! How may I help you?',
111                status_callback=status_cb,
112                audio_callback=audio_cb,
113
114                # Optional
115                stream_options=StreamOptions(mode=StreamMode.CHUNKS) if chunks else None,
116            )
117
118            # Will be filled in by callbacks. On submitting the generate
119            # request, we do not yet know what take_id will be assigned so we
120            # must discover it by means of the status callback.
121            generated_take = None
122
123            # We loop on the websocket while waiting 5 seconds between updates,
124            # and end when the take as been set to READY and all audio has been
125            # received. This update waits 1 second by default, here we set to 5
126            # seconds, but it can also wait forever by setting timeout to None
127            # or be made a non-blocking operation by setting timeout to 0.
128            # (Important: in async client, timeout=0 leads to TimeoutError, it
129            # cannot be used for non-blocking operations with asyncio.)
130            while not (ready and done) and (time.time() - t0) < 60:
131                try:
132                    await ws.update(timeout=5)
133                except DaisysWebsocketGenerateError as e:
134                    # As opposed to other websocket errors, if a generate error
135                    # occurs it does not necessarily mean we want to close the
136                    # stream.
137                    print(e)
138
139                    # In this example, however, we actually do, because we only
140                    # requested a single take, so stop here.
141                    break
142
143        # Delete the take
144        if generated_take:
145            print(f'Deleting take {generated_take.take_id}:',
146                  await speak.delete_take(generated_take.take_id))
147
148if __name__=='__main__':
149    try:
150        asyncio.run(main(chunks='--chunks' in sys.argv[1:]))
151    except HTTPStatusError as e:
152        try:
153            print(f'HTTP error status {e.response.status_code}: {e.response.json()["detail"]}, {e.request.url}')
154        except:
155            print(f'HTTP error status {e.response.status_code}: {e.response.text}, {e.request.url}')