Docs / Examples / vLLM

vLLM Integration

vLLM applications and proxy setups that target an OpenAI-compatible backend can be pointed at RelayRouter with a one-line config change. This lets you switch between local inference and hosted models without changing application code.

#OpenAI-compatible endpoint

RelayRouter exposes the same API shape as OpenAI, so any tool or framework that accepts a custom base_url works out of the box.

Setting	Value
`Base URL`	https://relayrouter.io/v1
`API Key`	Your RelayRouter API key
`Model name`	Any supported model ID, e.g. deepseek-v4-flash

#Pointing a vLLM client at RelayRouter

If your client targets a local vllm serve instance and you want to route through RelayRouter instead:

terminalbash

# Point your vLLM client at RelayRouter instead of localhost
export OPENAI_API_BASE="https://relayrouter.io/v1"
export OPENAI_API_KEY="$RELAYROUTER_API_KEY"

#Python (openai SDK)

vllm_client.pypython

from openai import OpenAI
import os

# Same code as any vLLM / OpenAI app, just change base_url
client = OpenAI(
    api_key=os.environ["RELAYROUTER_API_KEY"],
    base_url="https://relayrouter.io/v1",
)

response = client.chat.completions.create(
    model="deepseek-v4-flash",  # or any other supported model
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user",   "content": "Summarise the theory of relativity in 3 bullet points."},
    ],
    temperature=0.7,
    max_tokens=512,
)
print(response.choices[0].message.content)

#LiteLLM proxy

LiteLLM supports RelayRouter via the openai/ prefix and a custom api_base:

litellm_config.yamlyaml

model_list:
  - model_name: deepseek-v4-flash
    litellm_params:
      model: openai/deepseek-v4-flash
      api_base: https://relayrouter.io/v1
      api_key: os.environ/RELAYROUTER_API_KEY

#LangChain

langchain_example.pypython

from langchain_openai import ChatOpenAI
import os

llm = ChatOpenAI(
    model="deepseek-v4-flash",
    openai_api_key=os.environ["RELAYROUTER_API_KEY"],
    openai_api_base="https://relayrouter.io/v1",
)

result = llm.invoke("Explain transformer architecture in 2 paragraphs.")
print(result.content)

#Streaming

stream.pypython

stream = client.chat.completions.create(
    model="deepseek-v4-flash",
    messages=[{"role": "user", "content": "Write a haiku about the ocean."}],
    stream=True,
)
for chunk in stream:
    content = chunk.choices[0].delta.content
    if content:
        print(content, end="", flush=True)

All streaming is via Server-Sent Events (SSE), the same protocol as OpenAI and vLLM, so existing SSE consumers work without modification.

←Text-to-Image Overview→