Unified LLM Gateway

Unified LLM gateway providing OpenAI-compatible chat completion API. Access multiple model providers (OpenAI, Anthropic, Google, DeepSeek, etc.) through a single endpoint with token-based billing.

POST/v1/chat/completions

Chat Completions

POST/v1/chat/completions

OpenAI-compatible chat completion endpoint. Access models from OpenAI, Anthropic, Google, DeepSeek, MiniMax and more through a single unified API. Billed by token usage.

List Models

GET/v1/models

Returns all currently available models with pricing info (input/output price per 1M tokens) and max context length. Use the model id from this response as the model parameter in chat completions.

Streaming (SSE)

Set "stream": true to receive Server-Sent Events. Each event contains a chat.completion.chunk object with incremental delta.content. The stream ends with a data: [DONE] message.

Stream Event Format

data: {"id":"chatcmpl-xxx","object":"chat.completion.chunk","choices":[{"delta":{"content":"Hello"},"index":0}]}
data: {"id":"chatcmpl-xxx","object":"chat.completion.chunk","choices":[{"delta":{"content":" world"},"index":0}]}
data: [DONE]

Parameters

ParameterTypeRequiredDescription
modelstringRequiredModel name, e.g. 'gpt-4o', 'claude-sonnet-4', 'deepseek-chat'. Use GET /v1/models to list available models.
messagesarrayRequiredArray of message objects with 'role' (system/user/assistant) and 'content' fields.
temperaturefloatOptional (default: 0.7)Sampling temperature (0-2). Higher = more random.
max_tokensintegerOptional (default: 1024)Maximum tokens in the response.
streambooleanOptional (default: false)Enable SSE streaming for real-time token output.
top_pfloatOptionalNucleus sampling (0-1).
stopstring | string[]OptionalStop sequences.
toolsarrayOptionalTool/function definitions for function calling.

Response

Response
{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1710000000,
  "model": "gpt-4o",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Hello! How can I help you today?"
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 12,
    "completion_tokens": 8,
    "total_tokens": 20
  }
}

Examples

import requests

API_KEY = "your_api_key_here"
BASE_URL = "https://api.qinyanai.com"

# --- Chat completion ---
response = requests.post(
    f"{BASE_URL}/v1/chat/completions",
    headers={"Authorization": f"Bearer {API_KEY}"},
    json={
        "model": "gpt-4o",
        "messages": [
            {"role": "system", "content": "You are a helpful assistant."},
            {"role": "user", "content": "What is attention mechanism?"}
        ],
        "temperature": 0.7,
        "max_tokens": 1024
    }
)
result = response.json()
print(result["choices"][0]["message"]["content"])

# --- Streaming ---
response = requests.post(
    f"{BASE_URL}/v1/chat/completions",
    headers={"Authorization": f"Bearer {API_KEY}"},
    json={
        "model": "gpt-4o",
        "messages": [{"role": "user", "content": "Hello!"}],
        "stream": True
    },
    stream=True
)
for line in response.iter_lines():
    if line:
        print(line.decode())

# --- List available models ---
response = requests.get(
    f"{BASE_URL}/v1/models",
    headers={"Authorization": f"Bearer {API_KEY}"}
)
for model in response.json()["data"]:
    print(f"{model['id']} (by {model['owned_by']})")