A rate limit caps how often a caller can hit an endpoint in a window of time. The reason it matters more for vibe-coded SaaS than for a typical app is cost asymmetry: many AI-built apps expose endpoints that cost real money per call (an LLM request, an email, an image generation). Without a limit, the gap between your pricing and a bad actor's effort is enormous.

A rate limiter caps abusive traffic before it reaches an expensive endpointRequestsRate limiterper IP · per accountAI / upload routeusage cappedOver limit → 429
A limiter sits in front of expensive endpoints; traffic over the limit gets a 429 instead of a bill.

Which endpoints to protect first

You don't need to limit everything on day one. Start with the routes that are expensive, abusable, or security-sensitive — in that order.

  • AI and generation routes: anything calling an LLM, image or audio model. These cost money per call.
  • Auth routes: login, signup, password reset, OTP — to slow brute-force and enumeration.
  • Email and SMS senders: to prevent spam relays and cost abuse.
  • Upload and import routes: to limit storage abuse and large-payload denial of service.
  • Any unauthenticated, public endpoint that does real work.

Per-IP vs. per-account limits

These two limits defend against different attackers, so most routes want both. Per-IP limits slow anonymous floods; per-account limits stop a single signed-up user (or a stolen token) from hammering an expensive feature.

Limit typeStopsWeakness
Per IPAnonymous floods and scriptsDefeated by rotating IPs / proxies
Per accountOne user abusing a paid featureDefeated by mass signups
Per IP + per accountThe common cases togetherNeeds both identifiers resolved correctly
Combine per-IP and per-account limits; each covers the other's blind spot.

Set usage caps, not just rate limits

A rate limit controls speed; a usage cap controls total spend. For AI endpoints you want both. A per-minute limit stops a burst, but a daily or monthly cap is what saves you from a slow, steady drain that stays under the rate limit.

  • Add a hard per-account daily/monthly ceiling on AI usage, with a clear message when it's hit.
  • Set a global kill-switch or budget alert so you find out before the invoice does.
  • Tie generous limits to paid plans and tight limits to free/anonymous usage.

Find the public and AI-backed routes in your app that have no rate limit yet.

Check your endpoints

Mistakes that make rate limiting useless

  1. Limiting in the client. A limit enforced in the browser is no limit at all — it must run on the server.
  2. In-memory counters on serverless. If each request can hit a fresh instance, an in-memory counter resets constantly; use a shared store (such as Redis/Upstash) or your platform's limiter.
  3. Trusting a spoofable IP header so attackers bypass per-IP limits.
  4. Returning a silent failure or a 500 instead of a clear 429 with a retry hint, which breaks legitimate clients.
  5. Forgetting the cost cap, so a slow drain under the rate limit still runs up the bill.

A sensible starting point

If you want concrete defaults to adjust later: a handful of auth attempts per IP per minute; a low double-digit number of AI calls per account per minute with a daily ceiling; and tight limits on anything unauthenticated. The exact numbers matter less than having a server-side limit and a hard cap in place before launch.

Rate limiting won't make your app secure on its own, but skipping it is one of the few mistakes that can hurt on the very first day of traffic. It's a small amount of work for a large amount of peace of mind.

Get a launch-readiness report that flags unprotected and uncapped endpoints.

Scan your repo for free