Open beta — free during the beta

Stop paying for tokens you never asked for

Output tokens cost 3–5× more than input, and nobody controls response length. Outcap is a drop-in proxy that learns the real length of every feature, caps max_tokens intelligently, and guarantees your bill never blows past your budget.

Join the free beta Read the docs (5 min)

📱

Your app

🛡️

Outcap

capbudgetrouting

🤖

OpenAI · Anthropic

Outcap sits between your application and the AI. Every request flows through in under a millisecond — watched, capped, counted.

How much would Outcap save you?

A 10-second estimate. The real number is then measured on your actual traffic, in observe-only mode — no guessing.

Your monthly LLM API spend

$2,000

$100$50,000+

Your usage looks like…

Estimated savings

$240 – $400/ mo

≈ $3,840 / yr

$2,000−16%

See my real number — free

Estimate based on trimming length overshoot. Your real number is measured on your own traffic in observe-only mode (zero requests modified) — not a projection.

In plain words

You don't need to be a developer to get what this changes. Three pictures beat a long speech:

🌡️

A thermostat for your AI bill

Outcap learns the normal response length of each feature and stops the excess — like a thermostat that cuts the heating when a window is open.

💳

Your card's spending limit

You set a budget — per day, per month, per feature, or per customer. Beyond it, everything stops BEFORE you pay. No more end-of-month surprises.

🔌

An emergency circuit breaker

An AI looping at 3 a.m.? One button cuts everything instantly — and brings it back just as fast. You always stay in control.

Integration: one line to change

const openai = new OpenAI({
  baseURL: "https://proxy-production-8c61.up.railway.app/v1",   // ← the only line that changes
  apiKey: process.env.OUTCAP_KEY,
  defaultHeaders: {
    "x-provider-key": process.env.OPENAI_API_KEY,
    "x-outcap-route": "support-bot",
  },
});

Works with the official OpenAI and Anthropic SDKs, JavaScript and Python, streaming included — plus LangChain, CrewAI & any client with a custom base_url. Your provider key never leaves your infrastructure (BYOK).

How it works

From zero risk to real savings, at your own pace.

Observe (dry-run)

Plug in the proxy: no request is ever modified. Outcap discovers your routes, learns the response-length distribution (p50/p95/p99), and measures your actual waste.

Prove the savings

The dashboard shows “you would have saved $X” on every route — computed on YOUR traffic. No marketing projections: measured numbers.

Cap it for real

Enable the cap route by route: runaways get cut cleanly (never broken JSON), hard budgets block with a 429 before the provider call, and the kill switch stops everything in one click.

Everything you need to sleep at night

🎯

Smart auto-cap

The max_tokens ceiling is learned from your real traffic: p99 × 1.3 per route, self-adjusting if the cut rate exceeds 2%. Never above your own max_tokens.

✂️

Guaranteed clean cuts

A capped response ends at a sentence boundary. JSON cut by our cap is repaired (guaranteed non-streaming, a tested invariant) — and in streaming we widen the margin so we don't cut it, rather than buffering the whole response.

🛡️

Hard budgets + kill switch

Dollar ceilings per day or month — global, per key, per feature, or per end user. Exceeded? Clean 429 before any provider call. Plus a red button that stops everything when something loops at 3 a.m.

🧠

Model routing

“This route would cost −80% on a smaller model”: simulated on your real tokens, enabled in one click, savings measured to the cent. GPT-5 → mini, Sonnet → Haiku.

🔑

BYOK — your keys stay yours

Your OpenAI/Anthropic key passes through memory for the duration of the request: never logged, never stored. Request bodies are not retained.

📊

A dashboard that talks money

Cost per feature, per model, per customer. Simulated vs. realized savings. Detailed logs for every request. Finally, an LLM bill you can read.

31%

simulated savings on our demo traffic

< 1 ms

median overhead added by the proxy (continuously verified)

broken JSON delivered — repair is a tested invariant

Built to plug into production without the stress

Putting a proxy on your critical path takes trust. Here's why it's safe.

🔑

BYOK — your key stays yours

Your provider key passes through memory for the duration of the request: never logged, never stored.

🙈

Your content isn't stored

Metadata only (tokens, cost, latency). Neither your prompts nor your responses.

🔌

Fail-open, zero lock-in

If Outcap goes down, point baseURL back to the provider — one env var — and everything works.

⚡

< 1 ms overhead

Everything is checked in memory, with no database access on the request path.

🧪

Zero risk to start

Dry-run mode modifies no request: it measures your waste, it touches nothing.

💚

Open, free beta

No credit card. You stay in control, you leave whenever you want.

Your next LLM bill can be the last bad surprise

5 minutes to plug in, a risk-free observation mode, and numbers measured on your own traffic. Free during the beta.

Create my account