How to stream chat completions through RelayRouter

To stream chat completions through RelayRouter, keep your existing OpenAI, Anthropic or Gemini SDK, point the base URL at RelayRouter, swap in your RelayRouter API key, and enable streaming on the request. Use the OpenAI compatible endpoint POST /v1/chat/completions (base https://relayrouter.io/v1), the Anthropic compatible endpoint POST /v1/messages (base https://relayrouter.io), or the Gemini compatible endpoint POST /v1beta/models/{model}:generateContent. Authenticate with Authorization: Bearer YOUR_API_KEY. Streaming is supported across these protocols with no other code changes.

Choose your protocol and base URL

RelayRouter speaks three protocols, so you can use whichever SDK you already have. For OpenAI compatible clients, send POST /v1/chat/completions with base https://relayrouter.io/v1. For Anthropic compatible clients, send POST /v1/messages with base https://relayrouter.io. For Gemini compatible clients, send POST /v1beta/models/{model}:generateContent. The key principle is simple: keep your existing SDK, point the base URL at RelayRouter, and swap the API key. No other code changes are required. This lets you reuse your current streaming logic without rewriting your application around a new client library.

Protocol Base URL Endpoint
OpenAI compatible https://relayrouter.io/v1 POST /v1/chat/completions
Anthropic compatible https://relayrouter.io POST /v1/messages
Gemini compatible (Gemini base) POST /v1beta/models/{model}:generateContent

Steps to enable streaming

Setting up a streaming request takes only a few configuration changes. Because RelayRouter is a unified gateway, the same steps apply whether you are using Claude, GPT or Gemini models.

  1. Create an API key in the dashboard at https://relayrouter.io/dashboard.
  2. Point your SDK base URL at the correct RelayRouter endpoint for your protocol.
  3. Set the header Authorization: Bearer YOUR_API_KEY.
  4. Select a model, for example gpt-5.5, claude-opus-4-8 or gemini-3.5-flash.
  5. Enable streaming in your request and consume the streamed response as usual.

For detailed configuration guidance, see the documentation at https://relayrouter.io/docs.

Available models for streaming

Streaming is supported across the models RelayRouter offers. The Claude line includes claude-opus-4-8 and claude-fable-5. You can also stream from gpt-5.5 and Gemini 3.5 (gemini-3.5-flash). In addition, RelayRouter provides access to DeepSeek, MiniMax and Moonshot models. Because all of these are reachable through the same three compatible protocols, you can switch between providers by changing the model identifier rather than rewriting your streaming code. Live per-model rates are published at https://relayrouter.io/models, so you can confirm current pricing for whichever model you choose to stream.

Authentication, pricing and billing

All requests authenticate with Authorization: Bearer YOUR_API_KEY, and you create keys at https://relayrouter.io/dashboard. Payments are handled through Stripe card. Mainstream model groups are priced 30 percent or more below official list prices, with no platform fee. Importantly, failed or errored requests are never billed, which matters for streaming workloads where connections can occasionally drop mid response. This billing model means you only pay for successful completions. For the most current numbers, check the live per-model rates at https://relayrouter.io/models rather than relying on cached figures.

Frequently asked questions

Do I need to rewrite my code to stream through RelayRouter? No. Keep your existing OpenAI, Anthropic or Gemini SDK, point the base URL at RelayRouter, and swap the API key, with no other code changes.

Is streaming supported for all models? Streaming is supported, and you can select models including the Claude line, gpt-5.5, Gemini 3.5, DeepSeek, MiniMax and Moonshot.

Will I be charged if a streaming request fails? No. Failed or errored requests are never billed.


RelayRouter home · Models and pricing · Docs · All guides