Which endpoints need rate limiting first?

Start with expensive and abusable routes: AI/generation endpoints, auth (login, signup, password reset), email/SMS senders, uploads, and any unauthenticated route that does real work. Expand from there.

Should I use per-IP or per-account rate limits?

Usually both. Per-IP limits slow anonymous floods; per-account limits stop a single user or stolen token from hammering a paid feature. Each covers the other's blind spot.

Why isn't a rate limit enough for AI endpoints?

A rate limit caps speed, not total spend. A slow, steady stream of calls can stay under the limit and still run up a large bill. Add a hard per-account usage cap and a budget alert alongside the rate limit.

Why does my rate limiter not work on serverless?

In-memory counters reset because each request can hit a fresh instance. Use a shared store (such as Redis/Upstash) or your platform's built-in limiter so the count persists across instances.

A rate limiter caps abusive traffic before it reaches an expensive endpoint

A rate limit caps how often a caller can hit an endpoint in a window of time. The reason it matters more for vibe-coded SaaS than for a typical app is cost asymmetry: many AI-built apps expose endpoints that cost real money per call (an LLM request, an email, an image generation). Without a limit, the gap between your pricing and a bad actor's effort is enormous.

A limiter sits in front of expensive endpoints; traffic over the limit gets a 429 instead of a bill.

Which endpoints to protect first

You don't need to limit everything on day one. Start with the routes that are expensive, abusable, or security-sensitive — in that order.

AI and generation routes: anything calling an LLM, image or audio model. These cost money per call.
Auth routes: login, signup, password reset, OTP — to slow brute-force and enumeration.
Email and SMS senders: to prevent spam relays and cost abuse.
Upload and import routes: to limit storage abuse and large-payload denial of service.
Any unauthenticated, public endpoint that does real work.

Per-IP vs. per-account limits

These two limits defend against different attackers, so most routes want both. Per-IP limits slow anonymous floods; per-account limits stop a single signed-up user (or a stolen token) from hammering an expensive feature.

Limit type	Stops	Weakness
Per IP	Anonymous floods and scripts	Defeated by rotating IPs / proxies
Per account	One user abusing a paid feature	Defeated by mass signups
Per IP + per account	The common cases together	Needs both identifiers resolved correctly

Combine per-IP and per-account limits; each covers the other's blind spot.

Set usage caps, not just rate limits

A rate limit controls speed; a usage cap controls total spend. For AI endpoints you want both. A per-minute limit stops a burst, but a daily or monthly cap is what saves you from a slow, steady drain that stays under the rate limit.

Add a hard per-account daily/monthly ceiling on AI usage, with a clear message when it's hit.
Set a global kill-switch or budget alert so you find out before the invoice does.
Tie generous limits to paid plans and tight limits to free/anonymous usage.

Find the public and AI-backed routes in your app that have no rate limit yet.

Check your endpoints

Mistakes that make rate limiting useless

Limiting in the client. A limit enforced in the browser is no limit at all — it must run on the server.
In-memory counters on serverless. If each request can hit a fresh instance, an in-memory counter resets constantly; use a shared store (such as Redis/Upstash) or your platform's limiter.
Trusting a spoofable IP header so attackers bypass per-IP limits.
Returning a silent failure or a 500 instead of a clear 429 with a retry hint, which breaks legitimate clients.
Forgetting the cost cap, so a slow drain under the rate limit still runs up the bill.

A sensible starting point

If you want concrete defaults to adjust later: a handful of auth attempts per IP per minute; a low double-digit number of AI calls per account per minute with a daily ceiling; and tight limits on anything unauthenticated. The exact numbers matter less than having a server-side limit and a hard cap in place before launch.

Rate limiting won't make your app secure on its own, but skipping it is one of the few mistakes that can hurt on the very first day of traffic. It's a small amount of work for a large amount of peace of mind.

Get a launch-readiness report that flags unprotected and uncapped endpoints.

Scan your repo for free

Rate Limiting for Indie SaaS: A Practical Guide

Which endpoints to protect first

Per-IP vs. per-account limits

Set usage caps, not just rate limits

Mistakes that make rate limiting useless

A sensible starting point

Frequently asked questions

Which endpoints to protect first

Per-IP vs. per-account limits

Set usage caps, not just rate limits

Mistakes that make rate limiting useless

A sensible starting point

Frequently asked questions

Keep reading

SAST vs. Launch-Readiness Scanning: What Indie Founders Actually Need

Pre-Launch Checklist for Indie Hackers Shipping SaaS Apps