Docs / Examples / vLLM
vLLM Integration
vLLM applications and proxy setups that target an OpenAI-compatible backend can be pointed at RelayRouter with a one-line config change. This lets you switch between local inference and hosted models without changing application code.
#OpenAI-compatible endpoint
RelayRouter exposes the same API shape as OpenAI, so any tool or framework that accepts a custom base_url works out of the box.
| Setting | Value |
|---|---|
Base URL | https://relayrouter.io/v1 |
API Key | Your RelayRouter API key |
Model name | Any supported model ID, e.g. deepseek-v4-flash |
#Pointing a vLLM client at RelayRouter
If your client targets a local vllm serve instance and you want to route through RelayRouter instead:
terminalbash
# Point your vLLM client at RelayRouter instead of localhost
export OPENAI_API_BASE="https://relayrouter.io/v1"
export OPENAI_API_KEY="$RELAYROUTER_API_KEY"#Python (openai SDK)
vllm_client.pypython
from openai import OpenAI
import os
# Same code as any vLLM / OpenAI app, just change base_url
client = OpenAI(
api_key=os.environ["RELAYROUTER_API_KEY"],
base_url="https://relayrouter.io/v1",
)
response = client.chat.completions.create(
model="deepseek-v4-flash", # or any other supported model
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Summarise the theory of relativity in 3 bullet points."},
],
temperature=0.7,
max_tokens=512,
)
print(response.choices[0].message.content)#LiteLLM proxy
LiteLLM supports RelayRouter via the openai/ prefix and a custom api_base:
litellm_config.yamlyaml
model_list:
- model_name: deepseek-v4-flash
litellm_params:
model: openai/deepseek-v4-flash
api_base: https://relayrouter.io/v1
api_key: os.environ/RELAYROUTER_API_KEY#LangChain
langchain_example.pypython
from langchain_openai import ChatOpenAI
import os
llm = ChatOpenAI(
model="deepseek-v4-flash",
openai_api_key=os.environ["RELAYROUTER_API_KEY"],
openai_api_base="https://relayrouter.io/v1",
)
result = llm.invoke("Explain transformer architecture in 2 paragraphs.")
print(result.content)#Streaming
stream.pypython
stream = client.chat.completions.create(
model="deepseek-v4-flash",
messages=[{"role": "user", "content": "Write a haiku about the ocean."}],
stream=True,
)
for chunk in stream:
content = chunk.choices[0].delta.content
if content:
print(content, end="", flush=True)All streaming is via Server-Sent Events (SSE), the same protocol as OpenAI and vLLM, so existing SSE consumers work without modification.