Every snippet earned. Every dedupe semantic.
SSR-first code quiz across 10 languages with AI-generated snippets, pushed into the live Postgres through an admin content-push API instead of exposing a DB port, pgvector HNSW dedup on insert, per-request CSP nonce, token-bucket before auth. Next.js 16 · Drizzle · Postgres + pgvector · NextAuth v5 · Coolify on Hetzner.
Why this exists.
Code-reading is its own discipline. Anki and Quizlet are tuned for vocab and prose; throwing a snippet onto a flashcard turns a structural skill into a memory exercise. Capypad starts from a different premise: show short, real-world snippets in 10 languages — TS, JS, Python, Go, Rust, Java, C, C++, C#, SQL — and ask one focused question per snippet. The end-user surface is small on purpose.
The interesting engineering isn't the quiz. It's the content pipeline behind it: a four-gate insert path — topic-coverage steering, a cross-vendor LLM-as-Judge, a byte-identical-code reject, and pgvector semantic dedup — that no AI-generated row lands without passing; an operator push API that replaces network access to Postgres with one rate-limited HTTPS surface; and a runs table backed state machine that survives anonymous → signed-in claim mid-quiz. The smart parts are invisible at the edge but obvious in the code.
The box it had to fit in.
x-default, hreflang on every public route. No half-translated edges; the bilingual surface is a contract, not a stretch goal.unsafe-inline.proxy.ts mints a 16-byte nonce, propagates it via x-nonce, and JSON-LD scripts carry it through to the response.How it runs.
Six deliberate choices.
Drizzle pg-core + Postgres + pgvector instead of Prisma + SQLite + a separate search system.
drizzle-orm/pg-core against Postgres 18 with pgvector; embeddings live on the same row as the snippet, queried with HNSW.Admin Content-Push API instead of SSH tunnel or a public Postgres port.
POST /api/admin/content over HTTPS with a streaming bounded body, SHA-256 admin key checked via timingSafeEqual, and a dedicated rate-limit bucket.In-memory token bucket behind a RateLimiter interface.
RATE_LIMIT_TRUST_PROXY, exposed through a RateLimiter interface so the implementation is a one-file swap.RATE_LIMIT_TRUST_PROXY says the upstream is the edge we trust.Per-request CSP nonce instead of unsafe-inline relaxation.
proxy.ts mints a 16-byte base64 nonce per request, attaches it to x-nonce, and server components read it via headers(); every inline script — including the JSON-LD blocks — carries the nonce attribute.unsafe-inline is the policy that makes a CSP look green in the header and do nothing against XSS. A nonce-based policy with strict-dynamic is the only CSP that meaningfully constrains injected script. The Next.js 16 proxy path is the same shape as the one the docs ship with; the cost is exactly the discipline of remembering to thread the nonce through every new <script> — including JSON-LD, which is the place I forgot first and noticed in 30 seconds because the report-only run-up caught it.runs table + capypad_run_id HttpOnly cookie instead of URL params or localStorage.
runs row created by startRun, settled by finishRun, keyed by an HttpOnly cookie that the client cannot forge or read; auth state is a 4-case matrix the server resolves on every action.runs row is the only form where the client cannot lie about who owns the run. The 4-case matrix (anon vs signed-in × claimed vs unclaimed) closes the runId-guess TOCTOU that a single boolean would leak. State machines don't get smaller than this; they get wronger.LLM-as-Judge with a cross-vendor model instead of trusting the generator's own output.
instanceof claims, security antipatterns taught as patterns. It also proved the judge is only as good as the model behind it — one provider hit a 52% false-positive rate, confidently flagging correct code on stale framework knowledge. So the judge gates new inserts automatically, but a backlog flagged by the sweep gets a human (or better-model) review before anything is deleted. Defense in depth, never an auto-purge.Things that were not obvious.
HNSW for O(log n) nearest-neighbour search
(language, quizLanguage): pick a candidate snippet, cosine-compare against every existing row, throw a soft-flag past a threshold. At 60+60 rows per language that's fine. At 10× that, it's not.Real:
vector_cosine_ops HNSW indexes on embedding_code and embedding_text. The insert loop runs an approximate-NN query in milliseconds regardless of corpus size, the dedup decision stays a single query, and the index keeps log-scaling as the language fills up. The interesting part is that the index has to exist before the first row — building it later means the first N inserts dedup against nothing and silently land duplicates.Streaming readBoundedBody
Content-Length, reject if too big. The header can lie — an attacker who declares 10 KB and streams 10 GB will exhaust memory before the parser notices.Fix: stream the body, count bytes per chunk, call
reader.cancel() the moment the cumulative size crosses MAX_BODY_BYTES. Memory stays bounded even when the header under-reports; the cap is on actual bytes consumed, not on the value the client claims. Same shape as the standard "reject early" pattern, but the reject point is the chunk loop, not the header parse.Rate-limit before auth, not after
timingSafeEqual path, and there's no upper bound on how many guesses a bot can make before noticing.Fix: a dedicated
adminLimiter bucket at 10 req/min runs BEFORE verifyAdminKey. Brute-force attempts hit 429 long before they exercise the verifier. The auth path stays timing-safe; the bucket makes timing-safety load-bearing instead of decorative.Optimistic star toggle, serialised commit
fetch and trusts the database to settle. With fast clicks the responses arrive out of order and the DB row ends up in whatever state the slowest request landed last — last-write-wins, inverted from what the user did.Fix: three refs —
starRequestSeq, confirmedStarred, starInFlight — serialise each click off the previous settle. The optimistic UI flips immediately; the network commit waits for the prior request to confirm before issuing. Closes the inversion at the client without making the user wait, and without a server-side lock.Generation gates don't protect the backlog
Fix: a retroactive audit that keyset-paginates the whole corpus through the same judge layer, crash-safe via an
after-cursor, writing factual-bug verdicts to a judge_flag column. A 200-concept random sample read 5%; the expert-difficulty sweep read ~10%. The non-obvious part: the sweep needs its own write-back path — the generation gates reject, but a backlog audit has to flag in place so a human reviews before a delete. Insert-time and audit-time turned out to be two different problems wearing the same judge.What's running.
What I learned.
Insert-path gates beat a cleanup cron — for new rows.
The reflex on bad content is a nightly cleanup job — scan the table, group by similarity, merge or delete. That job needs logic and a migration path and a story for live traffic during the sweep, and the duplicates are already in user-facing runs by the time it fires. With the four-gate insert path the pre-check is milliseconds; the cron job never gets written because the bad rows never land.
The honest caveat: insert gates only protect what's written after they ship. The pre-gate backlog still needed exactly one retroactive sweep — and that's a different tool. Insert-time rejects; audit-time flags in place for review. Same judge, two problems. Build the gate first; budget for the one sweep that catches up the history.
One HTTPS surface beats three network holes.
The Mac → Coolify push API isn't a comfort feature — it's a reduction in attack surface. SSH tunnels and toggled Postgres ports are both forgot-it-once-and-it's-dead: the tunnel that wasn't closed, the firewall rule that survived a server rebuild, the operator key that lives on the laptop that isn't backed up. A single rate-limited HTTPS route with a streaming body cap and a timing-safe auth check is fewer moving parts and a better audit trail — and it's the only ingest path that the rest of the security model gets to assume. Boring beats clever at the network edge.