TokenCap
OpenAI removed hard spending limits. One leaked key or looping agent can burn $47K in 11 days. TokenCap enforces real budget caps your AI provider won't.
The Idea
A managed API proxy that sits between your application and AI providers (OpenAI, Anthropic, Google), enforcing hard budget caps at the request level. Not observability. Not dashboards. Actual enforcement: when your budget hits zero, the next API call gets rejected instantly. Set per-project daily, weekly, or monthly limits. Get kill switches that work in milliseconds, not hours. Connect in five minutes by swapping your API base URL. No DevOps, no self-hosting, no YAML configs. Think of it as a circuit breaker for your AI spend. One leaked API key or one looping agent pipeline can no longer drain your account because TokenCap refuses the call before it reaches the provider.
Why Now
Three converging forces. First, OpenAI silently removed hard spending limits from their API in late 2025, replacing them with email alerts that fire after the damage is done. Their community forum has multiple threads of users charged $1,200+ despite setting $50 limits. Second, agentic AI workflows have exploded: a single LangChain pipeline looped for 11 days and generated a $47,283 bill before anyone intervened. Third, AI inference now comprises 85% of enterprise AI budgets (up from 40% in 2023), and 78% of IT leaders report unexpected charges from consumption-based AI pricing. The providers have zero incentive to help you spend less. That is the gap.
How to Build
A lightweight proxy server (Node.js or Go) that intercepts API requests, checks them against a Redis-backed budget ledger, and either forwards or rejects. The proxy pattern is proven: LiteLLM, Portkey, and OpenRouter all work this way. Your differentiation is simplicity and enforcement-first design for non-technical teams. Frontend: Next.js dashboard showing spend per project, per key, per day. Features: hard caps (reject at limit), soft caps (alert then reject at buffer), kill switches (instant manual shutoff), and anomaly detection (flag 10x normal spend). Integrate with OpenAI, Anthropic, and Google Vertex APIs. Users swap their base URL from api.openai.com to proxy.tokencap.io and everything works identically until they hit their cap.
Revenue Model
Usage-based pricing aligned with the value delivered. Free tier: up to $100/month in proxied API spend (covers hobbyists and testing). Starter ($29/month): up to $2,000 in proxied spend, 3 projects, email alerts. Pro ($79/month): up to $10,000 in proxied spend, unlimited projects, Slack alerts, anomaly detection, team access. Agency ($199/month): up to $50,000 in proxied spend, white-label, client-level budgets. Enterprise: custom. The pricing model is elegant because customers only pay you when they are actively spending on AI. At 200 Pro subscribers, that is $15,800 MRR. Cross-sell opportunity: cost optimization recommendations based on usage patterns (suggest cheaper models for simple tasks).
Effort
One to two weeks for a production-ready MVP. The core proxy is straightforward: intercept HTTP requests, check budget in Redis, forward or reject, log the token count from the response. Day 1-2: proxy server handling OpenAI-compatible endpoints with Redis budget tracking. Day 3-4: Next.js dashboard with project creation, budget setting, and spend visualization. Day 5-6: Stripe billing, onboarding flow, API key management. Day 7: Anthropic and Google API support, alerting via email and Slack. Week two: anomaly detection, team permissions, and the agency dashboard. The hardest part is not the proxy logic but achieving sub-50ms latency overhead so users do not notice the intermediary.
Reddit Signal
The pain signal is loud across multiple channels. On Hacker News, "Tell HN: OpenAI removed budget limits from their API" triggered immediate concern about leaked keys and runaway costs. The OpenAI Developer Community has dedicated threads: one user was charged $1,200 despite a $50 hard limit, another $1,000+ above their cap, with multiple reports of the same pattern. A thread titled "Should all users have hard spending caps?" shows the community actively requesting what providers refuse to build. The Ravoid blog documented a $47,283 incident from a single looping agent, explicitly noting that "dashboards record spend after the fact, alerts notify after the fact, provider caps fire days after the damage." The demand is not speculative. People are being burned right now.
Risk
Three risks. First, latency: any proxy adds milliseconds. For real-time voice or streaming applications, even 30ms matters. Mitigate with edge deployment (Cloudflare Workers or Fly.io) and async budget checks where possible. Second, provider lock-in: if OpenAI reintroduces hard caps or builds enforcement features, your core value proposition shrinks. Mitigate by being multi-provider from day one and adding cost optimization features that go beyond simple caps. Third, trust: you are handling API keys for paying customers. A security breach would be catastrophic. Mitigate with key encryption at rest, SOC 2 compliance roadmap, and the option for customers to use their own proxy endpoint. Competition from LiteLLM and Portkey exists but requires self-hosting or enterprise pricing.
Verdict
Strong problem, clear timing, buildable in a sprint. The $47K incident and OpenAI removing hard limits created an obvious gap that no simple, self-serve tool fills today. LiteLLM requires DevOps. Portkey targets enterprises. Neither gives a non-technical agency owner a "set $500/month max and forget it" experience. The revenue model aligns perfectly with customer value. Deducting points for the trust/security bar being high and the risk that providers eventually fix their own caps. But "eventually" could be years, and the multi-provider angle (one dashboard for all your AI spend) gives staying power beyond simple budget enforcement.
OpenAI removed hard spending limits. Agentic workflows can loop for days unnoticed. 78% of IT leaders report unexpected AI charges. TokenCap fills the gap with dead-simple budget enforcement that works across providers. One URL swap, hard caps that actually stop the next call, and a dashboard that shows where every dollar goes. The trust bar is high and providers might eventually fix this themselves, but right now the gap is real, the pain is documented, and the build is a clean two-week sprint.