A rate limit caps how often a caller can hit an endpoint in a window of time. The reason it matters more for vibe-coded SaaS than for a typical app is cost asymmetry: many AI-built apps expose endpoints that cost real money per call (an LLM request, an email, an image generation). Without a limit, the gap between your pricing and a bad actor's effort is enormous.
Which endpoints to protect first
You don't need to limit everything on day one. Start with the routes that are expensive, abusable, or security-sensitive — in that order.
- AI and generation routes: anything calling an LLM, image or audio model. These cost money per call.
- Auth routes: login, signup, password reset, OTP — to slow brute-force and enumeration.
- Email and SMS senders: to prevent spam relays and cost abuse.
- Upload and import routes: to limit storage abuse and large-payload denial of service.
- Any unauthenticated, public endpoint that does real work.
Per-IP vs. per-account limits
These two limits defend against different attackers, so most routes want both. Per-IP limits slow anonymous floods; per-account limits stop a single signed-up user (or a stolen token) from hammering an expensive feature.
| Limit type | Stops | Weakness |
|---|---|---|
| Per IP | Anonymous floods and scripts | Defeated by rotating IPs / proxies |
| Per account | One user abusing a paid feature | Defeated by mass signups |
| Per IP + per account | The common cases together | Needs both identifiers resolved correctly |
Set usage caps, not just rate limits
A rate limit controls speed; a usage cap controls total spend. For AI endpoints you want both. A per-minute limit stops a burst, but a daily or monthly cap is what saves you from a slow, steady drain that stays under the rate limit.
- Add a hard per-account daily/monthly ceiling on AI usage, with a clear message when it's hit.
- Set a global kill-switch or budget alert so you find out before the invoice does.
- Tie generous limits to paid plans and tight limits to free/anonymous usage.
Find the public and AI-backed routes in your app that have no rate limit yet.
Check your endpointsMistakes that make rate limiting useless
- Limiting in the client. A limit enforced in the browser is no limit at all — it must run on the server.
- In-memory counters on serverless. If each request can hit a fresh instance, an in-memory counter resets constantly; use a shared store (such as Redis/Upstash) or your platform's limiter.
- Trusting a spoofable IP header so attackers bypass per-IP limits.
- Returning a silent failure or a 500 instead of a clear 429 with a retry hint, which breaks legitimate clients.
- Forgetting the cost cap, so a slow drain under the rate limit still runs up the bill.
A sensible starting point
If you want concrete defaults to adjust later: a handful of auth attempts per IP per minute; a low double-digit number of AI calls per account per minute with a daily ceiling; and tight limits on anything unauthenticated. The exact numbers matter less than having a server-side limit and a hard cap in place before launch.
Rate limiting won't make your app secure on its own, but skipping it is one of the few mistakes that can hurt on the very first day of traffic. It's a small amount of work for a large amount of peace of mind.
Get a launch-readiness report that flags unprotected and uncapped endpoints.
Scan your repo for free