Unified LLM Gateway

Unified LLM gateway providing OpenAI-compatible chat completion API. Access multiple model providers (OpenAI, Anthropic, Google, DeepSeek, etc.) through a single endpoint with token-based billing.

POST/v1/chat/completions

Chat Completions

POST/v1/chat/completions

OpenAI-compatible chat completion endpoint. Access models from OpenAI, Anthropic, Google, DeepSeek, MiniMax and more through a single unified API. Billed by token usage.

List Models

GET/v1/models

Returns all currently available models with pricing info (input/output price per 1M tokens) and max context length. Use the model id from this response as the model parameter in chat completions.

Streaming (SSE)

Set "stream": true to receive Server-Sent Events. Each event contains a chat.completion.chunk object with incremental delta.content. The stream ends with a data: [DONE] message.

Stream Event Format

data: {"id":"chatcmpl-xxx","object":"chat.completion.chunk","choices":[{"delta":{"content":"Hello"},"index":0}]}
data: {"id":"chatcmpl-xxx","object":"chat.completion.chunk","choices":[{"delta":{"content":" world"},"index":0}]}
data: [DONE]

Parameters

Parameter	Type	Required	Description
`model`	string	Required	Model name, e.g. 'gpt-4o', 'claude-sonnet-4', 'deepseek-chat'. Use GET /v1/models to list available models.
`messages`	array	Required	Array of message objects with 'role' (system/user/assistant) and 'content' fields.
`temperature`	float	Optional (default: 0.7)	Sampling temperature (0-2). Higher = more random.
`max_tokens`	integer	Optional (default: 1024)	Maximum tokens in the response.
`stream`	boolean	Optional (default: false)	Enable SSE streaming for real-time token output.
`top_p`	float	Optional	Nucleus sampling (0-1).
`stop`	string \| string[]	Optional	Stop sequences.
`tools`	array	Optional	Tool/function definitions for function calling.

Response

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1710000000,
  "model": "gpt-4o",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Hello! How can I help you today?"
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 12,
    "completion_tokens": 8,
    "total_tokens": 20
  }
}

Examples

import requests

API_KEY = "your_api_key_here"
BASE_URL = "https://api.qinyanai.com"

# --- Chat completion ---
response = requests.post(
    f"{BASE_URL}/v1/chat/completions",
    headers={"Authorization": f"Bearer {API_KEY}"},
    json={
        "model": "gpt-4o",
        "messages": [
            {"role": "system", "content": "You are a helpful assistant."},
            {"role": "user", "content": "What is attention mechanism?"}
        ],
        "temperature": 0.7,
        "max_tokens": 1024
    }
)
result = response.json()
print(result["choices"][0]["message"]["content"])

# --- Streaming ---
response = requests.post(
    f"{BASE_URL}/v1/chat/completions",
    headers={"Authorization": f"Bearer {API_KEY}"},
    json={
        "model": "gpt-4o",
        "messages": [{"role": "user", "content": "Hello!"}],
        "stream": True
    },
    stream=True
)
for line in response.iter_lines():
    if line:
        print(line.decode())

# --- List available models ---
response = requests.get(
    f"{BASE_URL}/v1/models",
    headers={"Authorization": f"Bearer {API_KEY}"}
)
for model in response.json()["data"]:
    print(f"{model['id']} (by {model['owned_by']})")