Bring your own LLM
TL;DR, If you have an OpenAI compatible API (test it out with the python or JS SDKs), pass Simli the base url, an API key, and the model name and you’re golden!
Background
Using Simli Auto (our end-to-end API) is awesome! But it can be a bit difficult to tailor it out for your needs. You may have a custom knowledge base, a RAG powered system, using your fine-tuned LLM that’s the best doctor or tutor ever known to mankind that you self-host, or just want to use a model that we don’t readily support. If that’s you, you’re in the right place.
There’s a fun trick that you may already know about if you’re this deep into the weeds, if you look at the Deepseek api docs, you can see that they’re using the OpenAI SDK as if it was their own! They just put in the base_url for their API and the API key and then it works as if nothing weird is going on.
Well, this is a result of Deepseek, and a lot of LLM API providers copying most of OpenAI’s homework in terms of API design, having the same endpoint naming scheme and Response body structure. OpenAI API has a lot going on; however, there are 3 essential parts that everyone copied: Request path, Request Structure, Response Structure.
The basic OpenAI request
To make an OpenAI-compatible LLM API, you need to have something resembling the following
POST http(s)://my-awesome-llm-hosted-here/some/random/path */chat/completions*
With the header Authorization
and value Bearer INSERT_SECRET_API_KEY
and Body
and the response(s)
Response has the mimetype text/eventstream
and the structure above is used for all chunks. You must ensure that the text body is correctly formatted. Additionally, you must indicate that your response is done by sending a DONE frame. All responses are followed by 2 newline delimiters. Using FastAPI for example, you would have an async generator looking like this:
Here’s an example server sending out a mock response with FastAPI, which can be simply run using python app.py
and passing the hosting URL to Simli. (more examples in other languages coming soon)
and another exxample just wrapping OpenAI SDK on the server