Simli WebRTC with WebSockets signaling user/migration guide

What is this?

If you’ve been using the Simli API directly, you might’ve notice a couple of issues:” . Connecting with mobiledata is finnick, Sometimes you get the elusive “datachannel is not open” when you don’t know what is the datachannel used for and what you might’ve done incorrectly. Other times, you may’ve had some lag spikes. We’ve had all of these issues when building the demo repos and our own website. So we fixed it! While the fix isn’t simple as having the backend manage everything, it’s not that much different!

WebSockets vs the datachannel

The datachannel is a spec used for bidirectional communication for WebRTC application. It’s awesome, but some details haven’t aged nicely with a more security concerned internet. It’s originally built for chat messages, on the other hand (for many good reasons), we’re using it to send raw audio to our servers, of course it’s complaining XD.

On the other hand, WebSockets is a relatively modern web standard that works on https (meaning it will work with most reasonably strict firewalls). However, it effectively works in the same way as the datachannel (both in terms of ergonomics and it’s browser API), without the hassles coming with the datachannel.

Why the change?

It’s simple, the datachannel worked horribly as a session init mechanism over turn leading to exceedingly high latency, something that we can’t live by as we aim to give you the least latency possible! There are also other benefits to this change but we will list them below the migration guide.

What to change?

Session Init

Well, you had a POST call to https://api.simli.ai/StartWebRTCSession which you sent a JSON with {sdp: pc.localDescription.sdp, type: pc.localDescription.type}. Now, it’s a WebSocket connection to wss://api.simli.ai/StartWebRTCSession to which you send the same JSON. On it, you receive two messages, the first one is reserved for possible future behavior, the second one is the output of the POST call: {sdp:remoteSDP, type:"answer"}. The second response you use with pc.setRemoteDescrition(ANSWER_RECEIVED_FROM_WEBSOCKET) and that’s the main difference in session initiallization nothing more.

Sending and Receiving Data

All messages that used to be on the datachannel are now transferred over the WebSocket connection. Which means you should send the session_token on the WebSocket connection instead of the datachannel, and then the audio bytes as well are send on the WebSocket connection. Anything that was datachannel.send() is now websocket.send()

That’s it we migrated everything

Possible updates and improvements that can be achived (NOTHING IS BUILT YET AND NO PROMISES XD)

This websocket connection is pretty nice because it affords us better signaling (so an incorrectly terminated session doesn’t linger for too long as the current behavior with the datachannel) We also have the ability now to add a broadcast feature in the future (single face, multiple recipients). And with some work, we are able to have session reconnects