Unified LLM Gateway
Unified LLM gateway providing OpenAI-compatible chat completion API. Access multiple model providers (OpenAI, Anthropic, Google, DeepSeek, etc.) through a single endpoint with token-based billing.
POST
/v1/chat/completionsChat Completions
POST
/v1/chat/completionsOpenAI-compatible chat completion endpoint. Access models from OpenAI, Anthropic, Google, DeepSeek, MiniMax and more through a single unified API. Billed by token usage.
List Models
GET
/v1/modelsReturns all currently available models with pricing info (input/output price per 1M tokens) and max context length. Use the model id from this response as the model parameter in chat completions.
Streaming (SSE)
Set "stream": true to receive Server-Sent Events. Each event contains a chat.completion.chunk object with incremental delta.content. The stream ends with a data: [DONE] message.
Stream Event Format
data: {"id":"chatcmpl-xxx","object":"chat.completion.chunk","choices":[{"delta":{"content":"Hello"},"index":0}]}
data: {"id":"chatcmpl-xxx","object":"chat.completion.chunk","choices":[{"delta":{"content":" world"},"index":0}]}
data: [DONE]Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
model | string | Required | Model name, e.g. 'gpt-4o', 'claude-sonnet-4', 'deepseek-chat'. Use GET /v1/models to list available models. |
messages | array | Required | Array of message objects with 'role' (system/user/assistant) and 'content' fields. |
temperature | float | Optional (default: 0.7) | Sampling temperature (0-2). Higher = more random. |
max_tokens | integer | Optional (default: 1024) | Maximum tokens in the response. |
stream | boolean | Optional (default: false) | Enable SSE streaming for real-time token output. |
top_p | float | Optional | Nucleus sampling (0-1). |
stop | string | string[] | Optional | Stop sequences. |
tools | array | Optional | Tool/function definitions for function calling. |
Response
Response
{
"id": "chatcmpl-abc123",
"object": "chat.completion",
"created": 1710000000,
"model": "gpt-4o",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Hello! How can I help you today?"
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 12,
"completion_tokens": 8,
"total_tokens": 20
}
}Examples
import requests
API_KEY = "your_api_key_here"
BASE_URL = "https://api.qinyanai.com"
# --- Chat completion ---
response = requests.post(
f"{BASE_URL}/v1/chat/completions",
headers={"Authorization": f"Bearer {API_KEY}"},
json={
"model": "gpt-4o",
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "What is attention mechanism?"}
],
"temperature": 0.7,
"max_tokens": 1024
}
)
result = response.json()
print(result["choices"][0]["message"]["content"])
# --- Streaming ---
response = requests.post(
f"{BASE_URL}/v1/chat/completions",
headers={"Authorization": f"Bearer {API_KEY}"},
json={
"model": "gpt-4o",
"messages": [{"role": "user", "content": "Hello!"}],
"stream": True
},
stream=True
)
for line in response.iter_lines():
if line:
print(line.decode())
# --- List available models ---
response = requests.get(
f"{BASE_URL}/v1/models",
headers={"Authorization": f"Bearer {API_KEY}"}
)
for model in response.json()["data"]:
print(f"{model['id']} (by {model['owned_by']})")