Simli WebRTC API basics
This is a quick and shallow guide on how to use the Simli WebRTC API. This won’t explain how WebRTC works or go into the details of the audio. You can check the following articles for that: WebRTC, Audio. In this article, we’re assuming you have a basic understanding of javascript, python, HTTP requests, and how to use APIs. Full codes will be posted at the end of this document.What’s the Simli API?
The Simli API is a way to transform any audio, whatever the source, into a talking head video (Humanoid character that speaks the audio) with realistic motions and low latency. The api allows anyone to add more life into their chat bots, virtual assistants, or any formless characters. For example, this article will give a face to any radio station that provides an audio stream over HTTP. However, the exact same principles apply to any audio source. The main input in the API is the audio stream. The audio must be in PCM Int16 format with a sampling rate of 16000 Hz and a single channel. The audio stream should be preferably sent in chunks of 6000 bytes; however, there’s no limit on minimum audio size, with the maximum being 65,536 bytes. The API initiates a WebRTC connection which is handled by the browser (or your WebRTC library of choice) so you don’t have to worry about playback details. The HTML and Javascript components of this demo are adapted from aiortc examplesWhat’s being written for this to work?
- Basic HTML page with a video and audio elements (to playback the webrtc stream).
- Javascript file to handle the WebRTC connection and sending the audio stream to the Simli API.
- Small python server to decode the mp3 audio stream to PCM Int16 and send it to the WebRTC connection. (not mandatory if you have a different way to get a correctly formatted audio stream).
- An audio stream source (for this example, we’re using a radio station stream).
index.html
The HTML file is pretty simple. It has a video element to display the talking head video and an audio element to play the audio stream. The video element is hidden by default and will only be shown when the WebRTC connection is established.client.js
The javascript file handles the WebRTC connection, sends the audio stream to the Simli API, and displays the video and audio stream.server.py
This one is also relatively simple and is used to decode the mp3 audio stream to PCM Int16 and send it back.python server.py