Skip to main content

Simli WebRTC API basics

This is a quick and shallow guide on how to use the Simli WebRTC API. This won’t explain how WebRTC works or go into the details of the audio. You can check the following articles for that: WebRTC, Audio. In this article, we’re assuming you have a basic understanding of javascript, python, HTTP requests, and how to use APIs. Full codes will be posted at the end of this document.

What’s the Simli API?

The Simli API is a way to transform any audio, whatever the source, into a talking head video (Humanoid character that speaks the audio) with realistic motions and low latency. The api allows anyone to add more life into their chat bots, virtual assistants, or any formless characters. For example, this article will give a face to any radio station that provides an audio stream over HTTP. However, the exact same principles apply to any audio source. The main input in the API is the audio stream. The audio must be in PCM Int16 format with a sampling rate of 16000 Hz and a single channel. The audio stream should be preferably sent in chunks of 6000 bytes; however, there’s no limit on minimum audio size, with the maximum being 65,536 bytes. The API initiates a WebRTC connection which is handled by the browser (or your WebRTC library of choice) so you don’t have to worry about playback details. The HTML and Javascript components of this demo are adapted from aiortc examples

What’s being written for this to work?

  • Basic HTML page with a video and audio elements (to playback the webrtc stream).
  • Javascript file to handle the WebRTC connection and sending the audio stream to the Simli API.
  • Small python server to decode the mp3 audio stream to PCM Int16 and send it to the WebRTC connection. (not mandatory if you have a different way to get a correctly formatted audio stream).
  • An audio stream source (for this example, we’re using a radio station stream).

index.html

The HTML file is pretty simple. It has a video element to display the talking head video and an audio element to play the audio stream. The video element is hidden by default and will only be shown when the WebRTC connection is established.
<!DOCTYPE html>
<html>
	<head>
		<meta charset="UTF-8" />
		<meta name="viewport" content="width=device-width, initial-scale=1.0" />
		<title>WebRTC demo</title>
		<style>
			button {
				padding: 8px 16px;
			}

			pre {
				overflow-x: hidden;
				overflow-y: auto;
			}

			video {
				width: 100%;
			}

			.option {
				margin-bottom: 8px;
			}

			#media {
				max-width: 1280px;
			}
		</style>
	</head>

	<body>
		<h2>Options</h2>

		<div class="option">
			<label for="apiKey">API key</label>
			<input id="apiKey" type="text" value="" />
		</div>

		<div class="option">
			<label for="faceId">Face ID</label>
			<input id="faceId" type="text" value="tmp9i8bbq7c" />
		</div>
		<div class="option">
			<label for="model">model</label>
			<input id="model" type="text" value="fasttalk" />
		</div>
		<div class="option">
			<input id="use-sfu" checked="checked" type="checkbox" />
			<label for="use-sfu">Use sfu</label>
		</div>
		<div class="option">
			<input id="use-stun" type="checkbox" checked="checked" />
			<label for="use-stun">Use STUN server</label>
		</div>
		<div class="option">
			<input id="handle-silence" type="checkbox" checked="checked" />
			<label for="handle-silence">Server handles no input case</label>
		</div>

		<div class="option">
			<label for="maxSessionLength">maxSessionLength</label>
			<input id="maxSessionLength" value="3600" />
		</div>
		<div class="option">
			<label for="maxIdleTime">maxIdleTime</label>
			<input id="maxIdleTime" value="300" />
		</div>

		<button id="start" onclick="start()">Start</button>
		<button id="stop" style="display: none" onclick="stop()">Stop</button>
		<label id="startTime">Startup Time: </label>

		<h2>State</h2>
		<p>ICE gathering state: <span id="ice-gathering-state"></span></p>
		<p>ICE connection state: <span id="ice-connection-state"></span></p>
		<p>Signaling state: <span id="signaling-state"></span></p>
		<input type="file" id="fileInput" />
		<input type="number" id="chunkSize" value="6000" />
		<button id="sendFile" onclick="sendFile()">Send file</button>
		<button id="playImmediate" onclick="playImmediate()">Play Immediate</button>
		<button id="playImmediateAndChunk" onclick="playImmediateAndChunk()">Play Immediate and chunk</button>
		<button id="sendZeros" onclick="sendZeros()">Send Zeros</button>
		<button id="skip" onclick="skip()">skip buffer</button>
		<div id="media" style="display: none">
			<h2>Media</h2>
			<audio id="audio" autoplay="true "></audio>
			<video id="video" autoplay="true" playsinline="true"></video>
		</div>

		<h2>Data channel</h2>
		<pre id="data-channel" style="height: 200px"></pre>

		<h2>SDP</h2>

		<h3>Offer</h3>
		<pre id="offer-sdp"></pre>

		<h3>Answer</h3>
		<pre id="answer-sdp"></pre>

		<script src="client.js"></script>
	</body>
</html>
The HTML file doesn’t do anything special, it just has button to initiate the connection, some text elements to show the state of the connection, and a video and audio element to display the video and audio stream respectively. It also imports a client.js file which contains the javascript code to handle the WebRTC connection.

client.js

The javascript file handles the WebRTC connection, sends the audio stream to the Simli API, and displays the video and audio stream.
// get DOM elements
var dataChannelLog = document.getElementById("data-channel"),
	iceConnectionLog = document.getElementById("ice-connection-state"),
	iceGatheringLog = document.getElementById("ice-gathering-state"),
	signalingLog = document.getElementById("signaling-state");
This block of code gets the DOM elements that will be used to display the state of the connection and the data channel messages.
// peer connection
var pc = null;

wsConnection = null;

function createPeerConnection() {
	var config = {
		sdpSemantics: "unified-plan",
	};

	config.iceServers = [{urls: ["stun:stun.l.google.com:19302"]}];

	pc = new RTCPeerConnection(config);
	// register some listeners to help debugging
	pc.addEventListener(
		"icegatheringstatechange",
		() => {
			iceGatheringLog.textContent += " -> " + pc.iceGatheringState;
		},
		false,
	);
	iceGatheringLog.textContent = pc.iceGatheringState;

	pc.addEventListener(
		"iceconnectionstatechange",
		() => {
			iceConnectionLog.textContent += " -> " + pc.iceConnectionState;
		},
		false,
	);
	iceConnectionLog.textContent = pc.iceConnectionState;

	pc.addEventListener(
		"signalingstatechange",
		() => {
			signalingLog.textContent += " -> " + pc.signalingState;
		},
		false,
	);
	signalingLog.textContent = pc.signalingState;

	// connect audio / video
	pc.addEventListener("track", (evt) => {
		if (evt.track.kind == "video") document.getElementById("video").srcObject = evt.streams[0];
		else document.getElementById("audio").srcObject = evt.streams[0];
	});

	pc.onicecandidate = (event) => {
		if (event.candidate === null) {
			console.log(JSON.stringify(pc.localDescription));
		} else {
			console.log(event.candidate);
			//   console.log(JSON.stringify(pc.localDescription));
			candidateCount += 1;
			//   console.log(candidateCount);
		}
	};

	return pc;
}
This block of code defines the function that creates the peer connection and registers some listeners to display the state of the connection. It also connects the audio and video tracks to the video and audio elements respectively.
let candidateCount = 0;
let prevCandidateCount = -1;
function CheckIceCandidates() {
	if (pc.iceGatheringState === "complete" || candidateCount === prevCandidateCount) {
		console.log(pc.iceGatheringState, candidateCount);
		connectToRemotePeer();
	} else {
		prevCandidateCount = candidateCount;
		setTimeout(CheckIceCandidates, 250);
	}
}

function negotiate() {
	return pc
		.createOffer()
		.then((offer) => {
			return pc.setLocalDescription(offer);
		})
		.then(() => {
			prevCandidateCount = candidateCount;
			setTimeout(CheckIceCandidates, 250);
		});
}
This block of code defines the function that initiates the negotiation process. It creates an offer, sets the local description, while it doesn’t wait until Gathering ICE candidates is finished; however, it waits until the ICE gathering state is complete or the number of candidates doesn’t change for 250ms.
async function connectToRemotePeer() {
	var offer = pc.localDescription;
	document.getElementById("offer-sdp").textContent = offer.sdp;

	const faceId = document.getElementById("faceId").value;
	if (faceId === "") {
		alert("Please enter faceId");
		return;
	}

	const metadata = {
		faceId: faceId,
		maxSessionLength: parseInt(document.getElementById("maxSessionLength").value),
		maxIdleTime: parseInt(document.getElementById("maxIdleTime").value),
	};
	console.log(metadata);
	console.log(parseInt(document.getElementById("maxSessionLength").value));

	const sessionPromise = await fetch("/compose/token", {
		method: "POST",
		body: JSON.stringify(metadata),
		headers: {
			"Content-Type": "application/json",
			"x-simli-api-key": document.getElementById("apiKey").value,
		},
	});
	session_token = await sessionPromise.json();
	wsURL = new URL(window.location.origin + "/compose/webrtc/p2p");
	wsURL.searchParams.set("session_token", session_token.session_token);
	const ws = new WebSocket(wsURL);
	wsConnection = ws;
	ws.addEventListener("message", async (evt) => {
		dataChannelLog.textContent += "< " + evt.data + "\n";
		if (evt.data === "START") {
			dcZeroAudio = setTimeout(() => {
				var message = new Uint8Array(64000);
				wsConnection.send(message);
				console.log("SEND");
			}, 100);
			return;
		}
		if (evt.data === "STOP") {
			stop();
			return;
		} else if (evt.data.slice(0, 4) === "pong") {
			console.log("PONG");
			var elapsed_ms = current_stamp() - parseInt(evt.data.substring(5), 10);
			dataChannelLog.textContent += " RTT " + elapsed_ms + " ms\n";
		} else {
			try {
				const message = JSON.parse(evt.data);
				if (message.type !== "answer") {
					return;
				}
				answer = message;
				document.getElementById("answer-sdp").textContent = answer.sdp;
			} catch (e) {
				console.log(e);
			}
		}
	});
	ws.addEventListener("close", () => {
		console.log("Websocket closed");
	});
	let answer = null;
	while (answer === null) {
		await new Promise((r) => setTimeout(r, 10));
	}
	await pc.setRemoteDescription(answer);
}
This block of code defines the function that connects to the remote peer. It sends the local description to the Simli API, gets the remote description, and sets it.
function start() {
	document.getElementById("start").style.display = "none";

	pc = createPeerConnection();

	var time_start = null;

	const current_stamp = () => {
		if (time_start === null) {
			time_start = new Date().getTime();
			return 0;
		} else {
			return new Date().getTime() - time_start;
		}
	};

	// Build media constraints.

	const constraints = {
		audio: true,
		video: true,
	};

	// Acquire media and start negotiation.

	document.getElementById("media").style.display = "block";
	navigator.mediaDevices.getUserMedia(constraints).then(
		(stream) => {
			stream.getTracks().forEach((track) => {
				pc.addTrack(track, stream);
			});
			return negotiate();
		},
		(err) => {
			alert("Could not acquire media: " + err);
		},
	);
	document.getElementById("stop").style.display = "inline-block";
}

function stop() {
	document.getElementById("stop").style.display = "none";

	// close transceivers
	if (pc.getTransceivers) {
		pc.getTransceivers().forEach((transceiver) => {
			if (transceiver.stop) {
				transceiver.stop();
			}
		});
	}

	// close local audio / video
	pc.getSenders().forEach((sender) => {
		sender.track.stop();
	});

	// close peer connection
	setTimeout(() => {
		pc.close();
	}, 500);
}
This block of code defines the functions that start and stop the connection. The start function creates the peer connection, creates a data channel, sends the metadata to the Simli API, and starts the negotiation process. The stop function closes the data channel, transceivers, local audio and video, and the peer connection.
async function initializeWebsocketDecoder() {
	ws = new WebSocket("ws://localhost:8080/");
	ws.onopen = function (event) {
		console.log("connected");
	};
	ws.onmessage = function (event) {
		dc.send(event.data);
	};
	ws.binaryType = "arraybuffer";
	while (dc.readyState !== "open") {
		await setTimeout(() => {}, 100);
	}
	response = await fetch("https://radio.talksport.com/stream");
	if (!response.ok) {
		throw new Error("Network response was not ok");
	}
	const reader = response.body.getReader();
	function read() {
		return reader.read().then(({done, value}) => {
			if (done) {
				console.log("Stream complete");
				return;
			}
			ws.send(value);
			return read();
		});
	}

	return read();
}
The last block of code defines the function that initializes the websocket decoder. It creates a websocket connection to the python server, sends the audio stream to the data channel, and reads the audio stream from the radio station. and Here’s an improved version used in our benchmark
// get DOM elements
var dataChannelLog = document.getElementById("data-channel"),
	iceConnectionLog = document.getElementById("ice-connection-state"),
	iceGatheringLog = document.getElementById("ice-gathering-state"),
	signalingLog = document.getElementById("signaling-state");

let wsConnection = null;
let answer = null;
let offer = null;
let startStamp = null;

// peer connection
var pc = null;

var session_token = null;
async function createPeerConnection() {
	var config = {
		sdpSemantics: "unified-plan",
	};

	if (document.getElementById("use-stun").checked) {
		// Fetch ICE servers first
		try {
			const sfu = document.getElementById("use-sfu").checked;
			const faceId = document.getElementById("faceId").value;
			if (faceId === "") {
				alert("Please enter faceId");
				return;
			}

			const metadata = {
				// faceId: "tmp9i8bbq7c",
				faceId: faceId,
				handleSilence: document.getElementById("handle-silence").checked,
				maxSessionLength: parseInt(document.getElementById("maxSessionLength").value),
				maxIdleTime: parseInt(document.getElementById("maxIdleTime").value),
			};
			console.log(metadata);
			console.log(parseInt(document.getElementById("maxSessionLength").value));

			const sessionPromise = await fetch("/compose/token", {
				method: "POST",
				body: JSON.stringify(metadata),
				headers: {
					"Content-Type": "application/json",
					"x-simli-api-key": document.getElementById("apiKey").value,
				},
			});
			session_token = await sessionPromise.json();
			wsURL = new URL(window.location.origin + "/compose/webrtc/p2p");
			wsURL.searchParams.set("session_token", session_token.session_token);
			wsURL.searchParams.set("enableSFU", sfu);

			const ws = new WebSocket(wsURL);
			wsConnection = ws;
			ws.addEventListener("open", async () => {
				while (!offer) {
					await new Promise((r) => setTimeout(r, 10));
				}

				ws.send(
					JSON.stringify({
						sdp: offer.sdp,
						type: offer.type,
					}),
				);
				// wsConnection.send(session_token.session_token);
			});
			ws.addEventListener("message", async (evt) => {
				dataChannelLog.textContent += "< " + evt.data + "\n";
				if (evt.data === "START") {
					dcZeroAudio = setTimeout(() => {
						var message = new Uint8Array(64000);
						wsConnection.send(message);
						console.log("SEND");
					}, 100);
					return;
				}
				if (evt.data === "STOP") {
					stop();
					return;
				} else if (evt.data.slice(0, 4) === "pong") {
					console.log("PONG");
					var elapsed_ms = current_stamp() - parseInt(evt.data.substring(5), 10);
					dataChannelLog.textContent += " RTT " + elapsed_ms + " ms\n";
				} else {
					try {
						const message = JSON.parse(evt.data);
						if (message.type !== "answer") {
							return;
						}
						answer = message;
						document.getElementById("answer-sdp").textContent = answer.sdp;
					} catch (e) {
						console.log(e);
					}
				}
			});
			ws.addEventListener("close", () => {
				console.log("Websocket closed");
			});

			const icePromise = await fetch("/compose/ice", {
				headers: {
					"Content-Type": "application/json",
				},
				method: "GET",
				headers: {
					"x-simli-api-key": document.getElementById("apiKey").value,
				},
			});
			// iceServers = await response.json();

			const results = await Promise.all([icePromise]);
			console.log(results);
			config.iceServers = await results[0].json();
		} catch (error) {
			console.error("Error fetching ICE servers:", error);
			// Fallback to Google STUN server if fetch fails
			config.iceServers = [{urls: ["stun:stun.l.google.com:19302"]}];
		}
	}

	console.log(config);
	pc = new RTCPeerConnection(config);
	// register some listeners to help debugging
	pc.addEventListener(
		"icegatheringstatechange",
		() => {
			iceGatheringLog.textContent += " -> " + pc.iceGatheringState;
		},
		false,
	);
	iceGatheringLog.textContent = pc.iceGatheringState;

	pc.addEventListener(
		"iceconnectionstatechange",
		() => {
			iceConnectionLog.textContent += " -> " + pc.iceConnectionState;
		},
		false,
	);
	iceConnectionLog.textContent = pc.iceConnectionState;

	pc.addEventListener(
		"signalingstatechange",
		() => {
			signalingLog.textContent += " -> " + pc.signalingState;
		},
		false,
	);
	signalingLog.textContent = pc.signalingState;

	// connect audio / video
	pc.addEventListener("track", (evt) => {
		if (evt.track.kind == "video") {
			document.getElementById("video").srcObject = evt.streams[0];
			document.getElementById("video").requestVideoFrameCallback(() => {
				const startupDuration = (new Date().getTime() - startStamp) / 1000;
				console.log("First video frame rendered!", startupDuration);
				document.getElementById("startTime").textContent += startupDuration.toString() + "s";
			});
			document;
		} else {
			document.getElementById("audio").srcObject = evt.streams[0];
		}
	});

	pc.onicecandidate = (event) => {
		if (event.candidate === null) {
			console.log(JSON.stringify(pc.localDescription));
		} else {
			console.log(event.candidate);
			//   console.log(JSON.stringify(pc.localDescription));
			candidateCount += 1;
			//   console.log(candidateCount);
		}
	};

	return pc;
}

function enumerateInputDevices() {
	const populateSelect = (select, devices) => {
		let counter = 1;
		devices.forEach((device) => {
			const option = document.createElement("option");
			option.value = device.deviceId;
			option.text = device.label || "Device #" + counter;
			select.appendChild(option);
			counter += 1;
		});
	};

	navigator.mediaDevices
		.enumerateDevices()
		.then((devices) => {
			populateSelect(
				document.getElementById("audio-input"),
				devices.filter((device) => device.kind == "audioinput"),
			);
			populateSelect(
				document.getElementById("video-input"),
				devices.filter((device) => device.kind == "videoinput"),
			);
		})
		.catch((e) => {
			alert(e);
		});
}

let candidateCount = 0;
let prevCandidateCount = -1;
function CheckIceCandidates() {
	if (pc.iceGatheringState === "complete" || candidateCount === prevCandidateCount) {
		console.log(pc.iceGatheringState, candidateCount);
		connectToRemotePeer().catch();
	} else {
		prevCandidateCount = candidateCount;
		setTimeout(CheckIceCandidates, 250);
	}
}

function negotiate() {
	return pc
		.createOffer()
		.then((offer) => {
			return pc.setLocalDescription(offer);
		})
		.then(() => {
			prevCandidateCount = candidateCount;
			setTimeout(CheckIceCandidates, 250);
		});
}

async function connectToRemotePeer() {
	offer = pc.localDescription;

	while (answer === null) {
		await new Promise((r) => setTimeout(r, 10));
	}
	console.log(answer);
	await pc.setRemoteDescription(answer);
}

var time_start = null;
const current_stamp = () => {
	if (time_start === null) {
		time_start = new Date().getTime();
		return 0;
	} else {
		return new Date().getTime() - time_start;
	}
};

function start() {
	startStamp = new Date().getTime();
	document.getElementById("start").style.display = "none";

	pc = createPeerConnection().then((pc) => {
		document.getElementById("media").style.display = "block";
		pc.addTransceiver("audio", {direction: "recvonly"});
		pc.addTransceiver("video", {direction: "recvonly"});
		negotiate();

		document.getElementById("stop").style.display = "inline-block";
	});
}

function sendZeros() {
	// Add event listener for file input change event

	if (dc && dc.readyState === "open") {
		wsConnection.send(new Uint8Array(64000));
		console.log("SEND ZEROS", Date.now());
	}
	dataChannelLog.textContent += "- Sent Zeros: " + "64000" + "\n";
}

function sendFile() {
	// Add event listener for file input change event

	var file = document.getElementById("fileInput").files[0];
	var reader = new FileReader();
	reader.onload = async function (e) {
		var arrayBuffer = e.target.result;
		var uint8Array = new Uint8Array(arrayBuffer);
		var button = document.getElementById("sendFile");
		let chunkSize = parseInt(document.getElementById("chunkSize").value);
		// console.log(uint8Array)
		if (dc && dc.readyState === "open") {
			// for (var x = 0; x < 600; x++)
			{
				for (var i = 0; i < uint8Array.length; i += chunkSize) {
					wsConnection.send(uint8Array.slice(i, i + chunkSize));
					console.log("SEND", Date.now());
					button.textContent = "Sending...";
				}
				await new Promise((r) => setTimeout(r, 200));
			}
			// dc.send(uint8Array);
			// console.log("SUNEN");
			dataChannelLog.textContent += "- Sent file: " + file.name + "\n";
			button.textContent = "Sent";
		}
	};

	reader.readAsArrayBuffer(file);
}

function playImmediate() {
	// Add event listener for file input change event

	var file = document.getElementById("fileInput").files[0];
	var reader = new FileReader();
	reader.onload = async function (e) {
		const arrayBuffer = new Uint8Array(e.target.result);
		const asciiStr = "PLAY_IMMEDIATE";
		const encoder = new TextEncoder(); // Default is utf-8
		const strBytes = encoder.encode(asciiStr); // Uint8Array of " World!"

		var uint8Array = new Uint8Array(strBytes.length + arrayBuffer.length);
		uint8Array.set(strBytes, 0);
		uint8Array.set(arrayBuffer, strBytes.length);
		var button = document.getElementById("playImmediate");
		// let chunkSize = parseInt(document.getElementById("chunkSize").value);
		// console.log(uint8Array)
		if (dc && dc.readyState === "open") {
			// for (var x = 0; x < 600; x++)
			wsConnection.send(uint8Array);
			// dc.send(uint8Array);
			// console.log("SUNEN");
			dataChannelLog.textContent += "- Sent file: " + file.name + "\n";
			button.textContent = "Sent";
		}
	};

	reader.readAsArrayBuffer(file);
}

function playImmediateAndChunk() {
	// Add event listener for file input change event

	var file = document.getElementById("fileInput").files[0];
	var reader = new FileReader();
	reader.onload = async function (e) {
		const arrayBuffer = new Uint8Array(e.target.result);
		const asciiStr = "PLAY_IMMEDIATE";
		const encoder = new TextEncoder(); // Default is utf-8
		const strBytes = encoder.encode(asciiStr); // Uint8Array of " World!"

		const firstChunkSize = 16000 * 2 * 4;
		const firstSlice = arrayBuffer.slice(0, Math.min(firstChunkSize, arrayBuffer.length));

		var uint8Array = new Uint8Array(strBytes.length + firstSlice.length);
		uint8Array.set(strBytes, 0);
		uint8Array.set(firstSlice, strBytes.length);
		var button = document.getElementById("playImmediateAndChunk");
		let chunkSize = parseInt(document.getElementById("chunkSize").value);
		// console.log(uint8Array)

		if (dc && dc.readyState === "open") {
			// for (var x = 0; x < 600; x++)
			wsConnection.send(uint8Array);
			// await new Promise((r) => setTimeout(r, 1000));
			if (arrayBuffer.length > firstChunkSize) {
				for (var i = firstChunkSize; i < arrayBuffer.length; i += chunkSize) {
					wsConnection.send(arrayBuffer.slice(i, i + chunkSize));
					console.log("SEND", Date.now());
					button.textContent = "Sending...";
				}
			}

			// dc.send(uint8Array);
			// console.log("SUNEN");
			dataChannelLog.textContent += "- Sent file: " + file.name + "\n";
			button.textContent = "Sent";
		}
	};

	reader.readAsArrayBuffer(file);
}

function skip() {
	wsConnection.send("SKIP");
}

function stop() {
	if (wsConnection) {
		wsConnection.send("DONE");
	}
	// close transceivers
	if (pc.getTransceivers()) {
		pc.getTransceivers().forEach((transceiver) => {
			if (transceiver.stop) {
				transceiver.stop();
			}
		});
	}

	// close local audio / video
	pc.getSenders().forEach((sender) => {
		sender.track.stop();
	});

	// close peer connection
	pc.close();
}

addEventListener("beforeunload", (event) => {
	stop();
});

server.py

This one is also relatively simple and is used to decode the mp3 audio stream to PCM Int16 and send it back.
from fastapi import FastAPI, WebSocket, WebSocketDisconnect
from starlette.websockets import WebSocketState
import asyncio
import uvicorn

app = FastAPI()


async def GetDecodeOutput(
    websocket: WebSocket, decodeProcess: asyncio.subprocess.Process
):
    while True:
        data = await decodeProcess.stdout.read(6000)
        if not data:
            break
        await websocket.send_bytes(data)


@app.websocket("/")
async def websocket_endpoint(websocket: WebSocket):
    await websocket.accept()
    try:
        decodeTask = await asyncio.subprocess.create_subprocess_exec(
            *[
                "ffmpeg",
                "-i",
                "pipe:0",
                "-f",
                "s16le",
                "-ar",
                "16000",
                "-ac",
                "1",
                "-acodec",
                "pcm_s16le",
                "-",
            ],
            stdin=asyncio.subprocess.PIPE,
            stdout=asyncio.subprocess.PIPE,
        )
        sendTask = asyncio.create_task(GetDecodeOutput(websocket, decodeTask))
        while (
            websocket.client_state == WebSocketState.CONNECTED
            and websocket.application_state == WebSocketState.CONNECTED
        ):
            data = await websocket.receive_bytes()
            decodeTask.stdin.write(data)
    except WebSocketDisconnect:
        pass
    finally:
        decodeTask.stdin.close()
        await decodeTask.wait()
        await sendTask
        await websocket.close()


if __name__ == "__main__":
    uvicorn.run(app, port=8080)
To run, just type python server.py

Tada! That’s everything!