Who is Kaushik Saravanan?

Kaushik Saravanan is an AI/ML engineer and MS in Artificial Intelligence Engineering candidate at Carnegie Mellon University (ECE, expected December 2027), based in Pittsburgh, PA. He was previously an Associate Application Engineer at SAP Labs India (2024–2026), where he shipped production GDPR-compliant RAG and LLM systems to 400+ users. IEEE-published researcher and Smart India Hackathon 2022 winner.

Is Kaushik Saravanan open to new AI/ML roles?

Yes. Kaushik is open to Summer 2027 AI/ML and RAG internships in the US, and full-time AI engineering roles starting January 2028 after his CMU MS-AIE graduation. Reach out via LinkedIn (linkedin.com/in/kaushiksss) or X (@Kaushiks0).

Does Kaushik need visa sponsorship?

Kaushik is an F-1 international student at Carnegie Mellon University. He has 3-year STEM OPT eligibility after his December 2027 graduation, and is open to employers who sponsor H-1B afterward.

What did Kaushik build at SAP Labs India?

At SAP Labs India (2024–2026) he engineered a GDPR-compliant, privacy-first RAG platform for SAP's internal chatbot. He scaled it to 2M+ documents and 400+ users with <2s p95 end-to-end latency, fine-tuned DeBERTa for Germany-specific PII detection (94% recall@10, MRR@10=0.82), and rewrote a credential-fetch client in dependency-free Go for 9,000+ Linux servers.

What are Kaushik's IEEE publications?

Two IEEE papers: 'Swarm Intelligence-Based Cooperative Intelligent Transportation System' (ICCIES 2025) and 'Cognitive Intrusion Detection System in Autonomous Vehicles Using Machine Learning' (ICPECTS 2024).

What is Kaushik's tech stack?

Python, Go, FastAPI, PyTorch, TensorFlow, Hugging Face Transformers, LangChain, PostgreSQL, Docker, Kubernetes, NVIDIA CUDA, Google Cloud Platform, and Microsoft Azure. Specializes in RAG pipelines, LLM fine-tuning (DeBERTa, QLoRA), and cloud observability.

An LRU Key-Rotation State Machine for a Personal Credential Vault

Q: What are Kaushik's IEEE publications?

Two IEEE papers: 'Swarm Intelligence-Based Cooperative Intelligent Transportation System' (ICCIES 2025) and 'Cognitive Intrusion Detection System in Autonomous Vehicles Using Machine Learning' (ICPECTS 2024).

Q: What is Kaushik's tech stack?

Python, Go, FastAPI, PyTorch, TensorFlow, Hugging Face Transformers, LangChain, PostgreSQL, Docker, Kubernetes, NVIDIA CUDA, Google Cloud Platform, and Microsoft Azure. Specializes in RAG pipelines, LLM fine-tuning (DeBERTa, QLoRA), and cloud observability.

The problem

I run a lot of side projects. Some of them are hackathon detritus, some are things I actually use, and a few sit behind kaushik.cv doing real work. They all need API keys — Gemini for LLM calls, ElevenLabs for TTS, Groq for the fast path when latency mattered, a Supabase URL, a Modal token, the usual grab bag.

For about two years I did what everyone does. One .env file per project. One key per provider. Copy-paste from a Notion page that had slowly turned into a security incident waiting to happen.

Two things broke that.

The first was quotas. Google's free tier for Gemini is generous until you have five projects hitting it, and then it isn't. One of my projects would burn through the daily quota by lunchtime and every other project would start returning 429s until UTC midnight. There was no fault isolation because there was no fleet — there was one key doing the work of eight.

The second was rotation. When I did rotate a key — because I'd accidentally leaked it to a public repo, or because the free tier had reset, or because I'd generated a fresh one and forgot which project was using the old one — I had to grep across every project's .env and hope I got them all. I never got them all.

The insight

The naive answer to "I need API keys in my projects" is: put them in a .env file. The next-least-naive answer is: put them in a secrets manager and inject at deploy. That's better, but it still treats each key as a single point of failure. You have one Gemini key. When it hits its quota, you're done.

The insight I'd been avoiding is that provider keys aren't credentials — they're a fleet. If I have eight Gemini keys, the right primitive isn't "which key does this project use?" It's "give me an available Gemini key, and if this one gets rate-limited, tell me and I'll rotate you to another." The vault stops being a filing cabinet and becomes a scheduler.

That's the entire idea behind CipherStack. It holds about 200 keys across 15 provider groups (Gemini, OpenRouter, Groq, HuggingFace, Mistral, ElevenLabs, Cloudflare AI, GitHub Models, Vercel, Clerk, Supabase, Qdrant, Kaggle, Resend, Modal, and a misc bucket for the long tail). Every key sits in a Postgres row, encrypted at rest with AES-256-GCM, and every vend picks the least-recently-used active key in the requested group.

The rest of this post is the state machine that makes that work.

The state machine

Each key lives in one of four states.

        ┌─────────────┐
        │  AVAILABLE  │◄─────────────┐
        └──────┬──────┘              │
               │ vend                 │ report success
               ▼                      │  (or TTL expiry)
        ┌─────────────┐               │
        │  IN-FLIGHT  │───────────────┤
        └──────┬──────┘               │
               │ report 429           │
               ▼                      │
        ┌─────────────┐               │
        │  COOLDOWN   │───────────────┘
        │  (60s TTL)  │
        └──────┬──────┘
               │ quota exhausted
               │ (repeated 429)
               ▼
        ┌─────────────┐
        │  EXHAUSTED  │
        │ (until UTC  │
        │  midnight)  │
        └─────────────┘

A four-state finite-state machine for CipherStack's LRU key lifecycle, drawn as slate-colored circles arranged in a cycle. AVAILABLE (status = active, cooldown_until IS NULL) transitions to IN-FLIGHT (last_vended_at = NOW()) on 'vend'. IN-FLIGHT returns to AVAILABLE on 'report success · /report', or falls to COOLDOWN (cooldown_until = NOW() + 60s) on 'report 429_rate_limited'. COOLDOWN returns to AVAILABLE via TTL expiry (implicit in the WHERE clause) or slides to EXHAUSTED (exhausted_until = tomorrow 00:00Z) after three or more 429s in a five-minute window. EXHAUSTED cycles back to AVAILABLE at daily UTC quota reset. A footer strip prints the single vend SQL statement that handles every transition. — The state machine is a fiction; the WHERE clause is the truth. Every transition collapses into one UPDATE with FOR UPDATE SKIP LOCKED.

The transitions are:

available → in-flight: a client hits /api/v1/vend/{group}, the vault picks the least-recently-used available key, stamps last_vended_at = now(), and returns the plaintext key.
in-flight → available: the client succeeds and (optionally) calls /api/v1/report with input/output token counts. The key returns to the available pool with a fresh last_vended_at, so the LRU ordering naturally spreads load across the fleet.
in-flight → cooldown: the client hits a rate limit, calls /report with error: "429_rate_limited", and the key gets cooldown_until = now() + 60s. The next vend query skips any row where cooldown_until > now().
cooldown → available: TTL expires. There's no cron job for this — the WHERE cooldown_until IS NULL OR cooldown_until < now() clause in the vend query does the work implicitly.
cooldown → exhausted: repeated 429s in a short window suggest daily quota, not a temporary spike. The key gets pinned out of rotation until UTC midnight.

The "in-flight" state is more of a bookkeeping fiction than a real state — I don't actually wait for a report before vending the same key again. LRU ordering + a small pool size means a key almost never gets vended twice in the same second, and even if it does, the downstream provider is the source of truth on whether a request is legal.

The vend query

The whole state machine collapses into one SQL statement. This is the query behind /api/v1/vend/{group}:

UPDATE api_keys
SET last_vended_at = NOW(),
    vend_count = vend_count + 1
WHERE id = (
  SELECT id FROM api_keys
  WHERE group_slug = $1
    AND status = 'active'
    AND (cooldown_until IS NULL OR cooldown_until < NOW())
    AND (exhausted_until IS NULL OR exhausted_until < NOW())
  ORDER BY last_vended_at ASC NULLS FIRST
  LIMIT 1
  FOR UPDATE SKIP LOCKED
)
RETURNING id, encrypted_key, provider, base_url;

The two lines that matter are ORDER BY last_vended_at ASC (that's the LRU) and FOR UPDATE SKIP LOCKED (that's what saved me under concurrency).

Client-side, a vend looks like this:

curl -H "Authorization: Bearer csk_..." \
  https://cipherstack.kaushik.cv/api/v1/vend/gemini
# {"key":"AIza...","key_id":"abc123","provider":"google",
#  "group_slug":"gemini","base_url":"https://..."}

The encrypted column gets decrypted in-process before the response is serialized — the plaintext key exists in memory for the duration of the HTTP handler and never on disk.

What surprised me

I'd braced for the cooldown mechanism to be the tricky bit. Cooldown was two lines of SQL. What actually bit me was the vend race.

The first version of the query didn't have FOR UPDATE SKIP LOCKED. It just did ORDER BY last_vended_at LIMIT 1. Under any concurrency at all, two simultaneous vends would read the same row, both update it, and both hand the same key to two different clients. In the single-user case this was fine — it just meant two projects were briefly sharing a key. In the "I let a friend hit the API from their hackathon project at the same time I was demoing mine" case, it meant we were both hammering the same Gemini key and hitting the same quota wall in half the expected time.

The fix was FOR UPDATE SKIP LOCKED, which is one of those Postgres features I'd read about, filed under "job queues," and never expected to use. What it does is: when the SELECT-for-UPDATE runs, if the row it wanted to lock is already locked by another transaction, it skips that row and picks the next one in the ordering. So two concurrent vends read different rows, each gets its own lock, each hands out a different key. The LRU ordering guarantees they're both getting the two most-underused keys, which is exactly what you want anyway.

The knock-on effect is that the vault degrades gracefully under load. If ten concurrent vends come in for a group with eight keys, eight of them get keys immediately and two get "no available keys." That's the correct behavior — the alternative is queuing, and queuing on the credential-fetch path adds latency to every downstream API call. Better to fail fast and let the client's retry logic pick up.

The auth story

CipherStack has two auth paths and they're for different threat models.

Service tokens are long-lived bearer tokens (csk_...) I paste into my project .env files. They're scoped to a set of groups and can be revoked from the dashboard. This is the personal-use path — the tokens sit on my own boxes, the blast radius of a leak is bounded by dashboard-level revocation, and I optimize for ergonomics.

Certificates are the path for anything I deploy publicly. Each certificate is an HMAC secret. The client signs {timestamp}:{group} with SHA-256, sends the signature along with the timestamp, and the server verifies. The timestamp has to be within a 5-minute window of server time, so a leaked signature is only replayable for five minutes — which for a vault whose whole job is handing out keys is the difference between "an attacker got one vend" and "an attacker got everything."

The rest of the API surface is deliberately narrow. Vend. Report. List groups. Dashboard endpoints behind session auth. No listing of keys, no bulk export, no way to enumerate what's in a group from a service token. If you compromise a token, you can vend from the groups it's scoped to — you can't dump the vault.

The numbers

The vault has been in production for about ten months now. Rough shape of the traffic:

~200 keys across 15 groups. The distribution is long-tailed: Gemini has 8, ElevenLabs has 4, some groups have 1.
~5,000 vends/day across side projects. That's a mix of the CV site itself, half a dozen dashboards, a Discord bot, and a couple of hackathon projects that never got turned off.
0 rate-limit downtime since deploy. That's the whole point. In the .env era, at least one project a week would go dark for a few hours because someone else's project had burned the shared key.
~2ms vend latency at p50, ~8ms at p95. It's Postgres and a single query. There's no more headroom to optimize.

What I'd do differently at 10x

If I ever have 2,000 keys and 50,000 vends a day, the bottleneck stops being the query and starts being the row-level lock contention on hot groups. Two changes that would probably need to happen:

Horizontal partitioning by group. Right now every key lives in the same api_keys table. That's fine at 200 rows. At 2,000, with skewed access patterns (Gemini gets vended 100x more than Resend), the hot rows sit on the same page and I'd want to shard by group_slug — probably native Postgres partitioning, one partition per group, so SKIP LOCKED scans a smaller working set per vend.
Cooldown state in Redis. The cooldown TTL is fine as a Postgres column at low scale, but at 10x I'd want cooldown lookups off the hot table entirely — a Redis sorted set per group, keyed by key_id, scored by cooldown_until_epoch. The vend query becomes "get me the LRU key from Postgres where the ID isn't in the Redis cooldown zset." That decouples the read path from any locking on cooldowns.

Neither of those is worth doing today. That's the discipline I keep trying to internalize: pick the architecture for the size you're at, not the size you might be at.