The problem
I run a lot of side projects. Some of them are hackathon detritus, some are things I actually use, and a few sit behind kaushik.cv doing real work. They all need API keys — Gemini for LLM calls, ElevenLabs for TTS, Groq for the fast path when latency mattered, a Supabase URL, a Modal token, the usual grab bag.
For about two years I did what everyone does. One .env file per project. One key per provider. Copy-paste from a Notion page that had slowly turned into a security incident waiting to happen.
Two things broke that.
The first was quotas. Google's free tier for Gemini is generous until you have five projects hitting it, and then it isn't. One of my projects would burn through the daily quota by lunchtime and every other project would start returning 429s until UTC midnight. There was no fault isolation because there was no fleet — there was one key doing the work of eight.
The second was rotation. When I did rotate a key — because I'd accidentally leaked it to a public repo, or because the free tier had reset, or because I'd generated a fresh one and forgot which project was using the old one — I had to grep across every project's .env and hope I got them all. I never got them all.
The insight
The naive answer to "I need API keys in my projects" is: put them in a .env file. The next-least-naive answer is: put them in a secrets manager and inject at deploy. That's better, but it still treats each key as a single point of failure. You have one Gemini key. When it hits its quota, you're done.
The insight I'd been avoiding is that provider keys aren't credentials — they're a fleet. If I have eight Gemini keys, the right primitive isn't "which key does this project use?" It's "give me an available Gemini key, and if this one gets rate-limited, tell me and I'll rotate you to another." The vault stops being a filing cabinet and becomes a scheduler.
That's the entire idea behind CipherStack. It holds about 200 keys across 15 provider groups (Gemini, OpenRouter, Groq, HuggingFace, Mistral, ElevenLabs, Cloudflare AI, GitHub Models, Vercel, Clerk, Supabase, Qdrant, Kaggle, Resend, Modal, and a misc bucket for the long tail). Every key sits in a Postgres row, encrypted at rest with AES-256-GCM, and every vend picks the least-recently-used active key in the requested group.
The rest of this post is the state machine that makes that work.
The state machine
Each key lives in one of four states.
┌─────────────┐
│ AVAILABLE │◄─────────────┐
└──────┬──────┘ │
│ vend │ report success
▼ │ (or TTL expiry)
┌─────────────┐ │
│ IN-FLIGHT │───────────────┤
└──────┬──────┘ │
│ report 429 │
▼ │
┌─────────────┐ │
│ COOLDOWN │───────────────┘
│ (60s TTL) │
└──────┬──────┘
│ quota exhausted
│ (repeated 429)
▼
┌─────────────┐
│ EXHAUSTED │
│ (until UTC │
│ midnight) │
└─────────────┘
The transitions are:
- available → in-flight: a client hits
/api/v1/vend/{group}, the vault picks the least-recently-used available key, stampslast_vended_at = now(), and returns the plaintext key. - in-flight → available: the client succeeds and (optionally) calls
/api/v1/reportwith input/output token counts. The key returns to the available pool with a freshlast_vended_at, so the LRU ordering naturally spreads load across the fleet. - in-flight → cooldown: the client hits a rate limit, calls
/reportwitherror: "429_rate_limited", and the key getscooldown_until = now() + 60s. The next vend query skips any row wherecooldown_until > now(). - cooldown → available: TTL expires. There's no cron job for this — the
WHERE cooldown_until IS NULL OR cooldown_until < now()clause in the vend query does the work implicitly. - cooldown → exhausted: repeated 429s in a short window suggest daily quota, not a temporary spike. The key gets pinned out of rotation until UTC midnight.
The "in-flight" state is more of a bookkeeping fiction than a real state — I don't actually wait for a report before vending the same key again. LRU ordering + a small pool size means a key almost never gets vended twice in the same second, and even if it does, the downstream provider is the source of truth on whether a request is legal.
The vend query
The whole state machine collapses into one SQL statement. This is the query behind /api/v1/vend/{group}:
UPDATE api_keys
SET last_vended_at = NOW(),
vend_count = vend_count + 1
WHERE id = (
SELECT id FROM api_keys
WHERE group_slug = $1
AND status = 'active'
AND (cooldown_until IS NULL OR cooldown_until < NOW())
AND (exhausted_until IS NULL OR exhausted_until < NOW())
ORDER BY last_vended_at ASC NULLS FIRST
LIMIT 1
FOR UPDATE SKIP LOCKED
)
RETURNING id, encrypted_key, provider, base_url;The two lines that matter are ORDER BY last_vended_at ASC (that's the LRU) and FOR UPDATE SKIP LOCKED (that's what saved me under concurrency).
Client-side, a vend looks like this:
curl -H "Authorization: Bearer csk_..." \
https://cipherstack.kaushik.cv/api/v1/vend/gemini
# {"key":"AIza...","key_id":"abc123","provider":"google",
# "group_slug":"gemini","base_url":"https://..."}The encrypted column gets decrypted in-process before the response is serialized — the plaintext key exists in memory for the duration of the HTTP handler and never on disk.
What surprised me
I'd braced for the cooldown mechanism to be the tricky bit. Cooldown was two lines of SQL. What actually bit me was the vend race.
The first version of the query didn't have FOR UPDATE SKIP LOCKED. It just did ORDER BY last_vended_at LIMIT 1. Under any concurrency at all, two simultaneous vends would read the same row, both update it, and both hand the same key to two different clients. In the single-user case this was fine — it just meant two projects were briefly sharing a key. In the "I let a friend hit the API from their hackathon project at the same time I was demoing mine" case, it meant we were both hammering the same Gemini key and hitting the same quota wall in half the expected time.
The fix was FOR UPDATE SKIP LOCKED, which is one of those Postgres features I'd read about, filed under "job queues," and never expected to use. What it does is: when the SELECT-for-UPDATE runs, if the row it wanted to lock is already locked by another transaction, it skips that row and picks the next one in the ordering. So two concurrent vends read different rows, each gets its own lock, each hands out a different key. The LRU ordering guarantees they're both getting the two most-underused keys, which is exactly what you want anyway.
The knock-on effect is that the vault degrades gracefully under load. If ten concurrent vends come in for a group with eight keys, eight of them get keys immediately and two get "no available keys." That's the correct behavior — the alternative is queuing, and queuing on the credential-fetch path adds latency to every downstream API call. Better to fail fast and let the client's retry logic pick up.
The auth story
CipherStack has two auth paths and they're for different threat models.
Service tokens are long-lived bearer tokens (csk_...) I paste into my project .env files. They're scoped to a set of groups and can be revoked from the dashboard. This is the personal-use path — the tokens sit on my own boxes, the blast radius of a leak is bounded by dashboard-level revocation, and I optimize for ergonomics.
Certificates are the path for anything I deploy publicly. Each certificate is an HMAC secret. The client signs {timestamp}:{group} with SHA-256, sends the signature along with the timestamp, and the server verifies. The timestamp has to be within a 5-minute window of server time, so a leaked signature is only replayable for five minutes — which for a vault whose whole job is handing out keys is the difference between "an attacker got one vend" and "an attacker got everything."
The rest of the API surface is deliberately narrow. Vend. Report. List groups. Dashboard endpoints behind session auth. No listing of keys, no bulk export, no way to enumerate what's in a group from a service token. If you compromise a token, you can vend from the groups it's scoped to — you can't dump the vault.
The numbers
The vault has been in production for about ten months now. Rough shape of the traffic:
- ~200 keys across 15 groups. The distribution is long-tailed: Gemini has 8, ElevenLabs has 4, some groups have 1.
- ~5,000 vends/day across side projects. That's a mix of the CV site itself, half a dozen dashboards, a Discord bot, and a couple of hackathon projects that never got turned off.
- 0 rate-limit downtime since deploy. That's the whole point. In the
.envera, at least one project a week would go dark for a few hours because someone else's project had burned the shared key. - ~2ms vend latency at p50, ~8ms at p95. It's Postgres and a single query. There's no more headroom to optimize.
What I'd do differently at 10x
If I ever have 2,000 keys and 50,000 vends a day, the bottleneck stops being the query and starts being the row-level lock contention on hot groups. Two changes that would probably need to happen:
- Horizontal partitioning by group. Right now every key lives in the same
api_keystable. That's fine at 200 rows. At 2,000, with skewed access patterns (Gemini gets vended 100x more than Resend), the hot rows sit on the same page and I'd want to shard bygroup_slug— probably native Postgres partitioning, one partition per group, soSKIP LOCKEDscans a smaller working set per vend. - Cooldown state in Redis. The cooldown TTL is fine as a Postgres column at low scale, but at 10x I'd want cooldown lookups off the hot table entirely — a Redis sorted set per group, keyed by
key_id, scored bycooldown_until_epoch. The vend query becomes "get me the LRU key from Postgres where the ID isn't in the Redis cooldown zset." That decouples the read path from any locking on cooldowns.
Neither of those is worth doing today. That's the discipline I keep trying to internalize: pick the architecture for the size you're at, not the size you might be at.
See also
- HNSW or IVF-PQ? What I actually chose at 2M documents — a different flavor of the same "pick for the regime you're in" lesson, this time in vector search.
CipherStack is live at cipherstack.kaushik.cv. The dashboard is behind auth but the docs and the LLM-readable llms.txt are open.