Frequently asked questions
Got another question? Write to us — we answer fast.
›Is my OpenAI/Anthropic key stored on your side?
No, never. That's the BYOK principle (Bring Your Own Key): your provider key travels in the x-provider-key header, stays in memory for the duration of the request, and is never logged or persisted. If our servers were compromised, there would be no keys to steal.
›How much latency does the proxy add?
Measured median overhead is under 1 ms (p95 ~3 ms): every check (auth, cap, budgets) happens in memory, with zero database access on the request path. Network latency depends on distance to our servers (hosted US-East, close to the OpenAI/Anthropic APIs).
›What happens if Outcap goes down?
Fail-open architecture: Outcap holds no data your calls depend on. In an incident, point baseURL back to the provider's API (one environment variable) and everything works as before. No dependency, no lock-in.
›Is streaming supported?
Yes, fully (SSE), for both OpenAI and Anthropic. In dry-run the stream is relayed byte-for-byte. In active mode, a small buffer even lets us finish the current sentence when the cap cuts a response.
›Can a cap break my JSON outputs?
Two cases, two strategies. Non-streaming (we have the full response): if our cap cuts a JSON, we repair it before delivery — a tested invariant, no invalid JSON ever produced by our cut. Streaming: repairing on the fly would mean buffering the whole response (latency), so we don't — instead our active cap takes a ×1.5 margin (never above your own max_tokens) so it won't cut the JSON. And if it's YOUR max_tokens that truncates a streamed JSON, we pass it through untouched: we never rewrite a cut we didn't cause.
›What's the difference between dry-run and active mode?
Dry-run (default): no request is modified — Outcap observes and computes what a cap WOULD have saved. Active mode (opt-in, per route): the learned max_tokens is actually applied. You enable it route by route, once the simulated numbers have convinced you, and it's reversible in one click.
›How does model routing work?
For each route, Outcap computes what your last 30 days of traffic would have cost on a cheaper model from the same provider (e.g. GPT-5 → GPT-5-mini, Sonnet → Haiku). You see the savings percentage BEFORE enabling. If you route, only the model field is rewritten — and the saving is measured exactly on every request (the difference between the two price lists). Reversible in one click; we never route across providers (your BYOK key wouldn't follow), and never without your explicit decision.
›How are savings calculated?
In dry-run: tokens beyond the simulated cap × the model's output price — measured on every real response. In active mode: a cut response doesn't reveal its true length (censored data), so we use the average overshoot observed during your dry-run phase, clearly marked as an estimate. Routing savings, on the other hand, are exact to the cent. The reference method remains before/after on average cost per request.
›Which providers and models are supported?
Anything that speaks the OpenAI format (/v1/chat/completions) or the Anthropic format (/v1/messages): OpenAI, Anthropic, and OpenAI-compatible providers. Reasoning models (o1/o3/o4) are never actively capped — their reasoning quality is non-negotiable.
›What exactly do you store?
Metadata only: token counts, cost, latency, model, finish reason, route. The contents of your prompts and responses are NOT retained. The route fingerprint is a hash of the system prompt, not the prompt itself.
›How do hard budgets work?
You set a dollar ceiling per day or per month — global, per API key, per feature, or per end user. Beyond it, requests are refused with a clear 429 and a retry-after header, BEFORE any provider call: a blocked request costs nothing. Propagation tolerance: a few seconds.
›Does my code have to change?
Two lines: baseURL points to Outcap, and your provider key goes in a header. The official OpenAI or Anthropic SDK keeps working as-is, streaming and tool calls included. The route tag (x-outcap-route) is optional but recommended.
›How do I cancel or get my data out?
During the beta, an email is enough (subscriptions don't exist yet). Your usage metadata is exportable on request. And since Outcap holds neither your keys nor your contents, leaving = changing one environment variable.
›I forgot my password.
Automatic email reset is coming soon. In the meantime, write to us and we'll unlock your account manually after verification.