Open beta β free during the beta
Stop paying for tokens you never asked for
Output tokens cost 3β5Γ more than input, and nobody controls response length. Outcap is a drop-in proxy that learns the real length of every feature, caps max_tokens intelligently, and guarantees your bill never blows past your budget.
π±
Your app
π‘οΈ
Outcap
π€
OpenAI Β· Anthropic
Outcap sits between your application and the AI. Every request flows through in under a millisecond β watched, capped, counted.
How much would Outcap save you?
A 10-second estimate. The real number is then measured on your actual traffic, in observe-only mode β no guessing.
$2,000
Estimated savings
$240 β $400/ mo
β $3,840 / yr
Estimate based on trimming length overshoot. Your real number is measured on your own traffic in observe-only mode (zero requests modified) β not a projection.
In plain words
You don't need to be a developer to get what this changes. Three pictures beat a long speech:
π‘οΈ
A thermostat for your AI bill
Outcap learns the normal response length of each feature and stops the excess β like a thermostat that cuts the heating when a window is open.
π³
Your card's spending limit
You set a budget β per day, per month, per feature, or per customer. Beyond it, everything stops BEFORE you pay. No more end-of-month surprises.
π
An emergency circuit breaker
An AI looping at 3 a.m.? One button cuts everything instantly β and brings it back just as fast. You always stay in control.
const openai = new OpenAI({
baseURL: "https://proxy-production-8c61.up.railway.app/v1", // β the only line that changes
apiKey: process.env.OUTCAP_KEY,
defaultHeaders: {
"x-provider-key": process.env.OPENAI_API_KEY,
"x-outcap-route": "support-bot",
},
});Works with the official OpenAI and Anthropic SDKs, JavaScript and Python, streaming included β plus LangChain, CrewAI & any client with a custom base_url. Your provider key never leaves your infrastructure (BYOK).
How it works
From zero risk to real savings, at your own pace.
Observe (dry-run)
Plug in the proxy: no request is ever modified. Outcap discovers your routes, learns the response-length distribution (p50/p95/p99), and measures your actual waste.
Prove the savings
The dashboard shows βyou would have saved $Xβ on every route β computed on YOUR traffic. No marketing projections: measured numbers.
Cap it for real
Enable the cap route by route: runaways get cut cleanly (never broken JSON), hard budgets block with a 429 before the provider call, and the kill switch stops everything in one click.
Everything you need to sleep at night
Smart auto-cap
The max_tokens ceiling is learned from your real traffic: p99 Γ 1.3 per route, self-adjusting if the cut rate exceeds 2%. Never above your own max_tokens.
Guaranteed clean cuts
A capped response ends at a sentence boundary. JSON cut by our cap is repaired (guaranteed non-streaming, a tested invariant) β and in streaming we widen the margin so we don't cut it, rather than buffering the whole response.
Hard budgets + kill switch
Dollar ceilings per day or month β global, per key, per feature, or per end user. Exceeded? Clean 429 before any provider call. Plus a red button that stops everything when something loops at 3 a.m.
Model routing
βThis route would cost β80% on a smaller modelβ: simulated on your real tokens, enabled in one click, savings measured to the cent. GPT-5 β mini, Sonnet β Haiku.
BYOK β your keys stay yours
Your OpenAI/Anthropic key passes through memory for the duration of the request: never logged, never stored. Request bodies are not retained.
A dashboard that talks money
Cost per feature, per model, per customer. Simulated vs. realized savings. Detailed logs for every request. Finally, an LLM bill you can read.
31%
simulated savings on our demo traffic
< 1 ms
median overhead added by the proxy (continuously verified)
0
broken JSON delivered β repair is a tested invariant
Built to plug into production without the stress
Putting a proxy on your critical path takes trust. Here's why it's safe.
BYOK β your key stays yours
Your provider key passes through memory for the duration of the request: never logged, never stored.
Your content isn't stored
Metadata only (tokens, cost, latency). Neither your prompts nor your responses.
Fail-open, zero lock-in
If Outcap goes down, point baseURL back to the provider β one env var β and everything works.
< 1 ms overhead
Everything is checked in memory, with no database access on the request path.
Zero risk to start
Dry-run mode modifies no request: it measures your waste, it touches nothing.
Open, free beta
No credit card. You stay in control, you leave whenever you want.
Your next LLM bill can be the last bad surprise
5 minutes to plug in, a risk-free observation mode, and numbers measured on your own traffic. Free during the beta.
Create my account