CASE · 102026 — LIVESOLO · END-TO-ENDSAAS · POSTGRES · pgvector
◦ CAPYPAD · capypad.com

Every snippet earned. Every dedupe semantic.

SSR-first code quiz across 10 languages with AI-generated snippets, pushed into the live Postgres through an admin content-push API instead of exposing a DB port, pgvector HNSW dedup on insert, per-request CSP nonce, token-bucket before auth. Next.js 16 · Drizzle · Postgres + pgvector · NextAuth v5 · Coolify on Hetzner.

§ 01Problem · motivation

Why this exists.

Anki and Quizlet treat learning as flashcards over text. Neither models code-reading as the distinct skill it is — and the interesting engineering doesn't live in the quiz UI anyway.

Code-reading is its own discipline. Anki and Quizlet are tuned for vocab and prose; throwing a snippet onto a flashcard turns a structural skill into a memory exercise. Capypad starts from a different premise: show short, real-world snippets in 10 languages — TS, JS, Python, Go, Rust, Java, C, C++, C#, SQL — and ask one focused question per snippet. The end-user surface is small on purpose.

The interesting engineering isn't the quiz. It's the content pipeline behind it: a four-gate insert path — topic-coverage steering, a cross-vendor LLM-as-Judge, a byte-identical-code reject, and pgvector semantic dedup — that no AI-generated row lands without passing; an operator push API that replaces network access to Postgres with one rate-limited HTTPS surface; and a runs table backed state machine that survives anonymous → signed-in claim mid-quiz. The smart parts are invisible at the edge but obvious in the code.

§ 02Constraints · operating box

The box it had to fit in.

Solo dev, real users, hardened by default. No microservices, no event-bus choreography, no DB ports on the public internet.
C/01 · OPS
Solo dev, no team. Every architectural decision has to be maintainable without one — no microservices, no event-bus choreographies, no distributed-tracing stage set.
C/02 · HOSTING
Coolify on a dedicated Hetzner server — app and Postgres run as Docker services in one compose topology, Cloudflare edge in front (TLS, caching). No cloud lock-in, predictable monthly cost.
C/03 · CONTENT-OPS
AI generates snippets locally on a Mac. They reach the live DB through an admin content-push API over HTTPS — the Postgres port is never exposed and there is no SSH tunnel to forget to close.
C/04 · CONTENT-INTEGRITY
60+60 rows per language and growing, all AI-generated. Two failure modes: duplication — hash matching loses to whitespace, renames, reordering, so dedup has to be semantic — and factual bugs the model emits with full confidence. Both have to be caught on the insert path, cheap enough to run on every write.
C/05 · AUTH
NextAuth v5 with the Drizzle adapter, Google + GitHub. An anonymous user can start a quiz, sign in mid-run, and claim the run without losing progress — the auth model carries this 4-case matrix, not the URL.
C/06 · I18N FLOOR
DE/EN at equal quality, EN as x-default, hreflang on every public route. No half-translated edges; the bilingual surface is a contract, not a stretch goal.
C/07 · CSP
Strict CSP via per-request nonce, no unsafe-inline.proxy.ts mints a 16-byte nonce, propagates it via x-nonce, and JSON-LD scripts carry it through to the response.
C/08 · SUPPLY-CHAIN
SHA-pinned base images, Trivy in CI on image and filesystem, third-party GitHub Actions pinned to commit SHAs. The deploy artefact is reproducible; nothing floats on a moving tag.
§ 03Architecture · wizard to push

How it runs.

Three lanes: top is the user request path (Wizard → run-cookie → auth → CSP), middle is the content path (embed → HNSW dedup → admin API → bounded body), bottom is cross-cutting (stars, rate-limit, audit, deploy).
capypad.com·langs 10·rows 600+·pgvector HNSW·refill 30 RPS·embed 1536d
inserts/h 0·nonces/min 0
WIZARD · 01
SSR step
language → source → settings
flow3-step · server
RUN · 02
capypad_run_id
HttpOnly · cookie binding
statestartRun → finishRun
AUTH · 03
NextAuth v5
Drizzle adapter · 4-case finishRun
modelanon → signed-in
CSP · 04
per-request nonce
strict-dynamic · 16-byte b64
nonces/min0
EMBED · 05
text-embedding-3-small
1536-dim · OpenRouter
embed/min0
HNSW · 06
pgvector index
cosine · O(log n)
dedupinsert-path · soft-flag
ADMIN · 07
POST /api/admin/content
HTTPS · timing-safe key
inserts/h0
BODYCAP · 08
readBoundedBody
1 MB cap · stream cancel
guardper-chunk counter
STARS · 09
optimistic ref
seq · serialised settle
RATE · 10
token-bucket
60 burst · 30 RPS refill
AUDIT · 11
[admin-content] log
event=inserted · JSON-flat
formatflat · grep-ready
DEPLOY · 12
Coolify · Hetzner
SHA-pinned · push-to-deploy
edgeCloudflare · Traefik
EVENT LOG · /api/admin/content · pgvector HNSW · CSP nonce
21:14:08embedtext-embedding-3-small · 1536-dim · ~142ms
21:14:07hnswNN cosine 0.94 · flagged-dup soft-flag
21:14:07adminPOST /api/admin/content · HTTPS · rate-limit ok
21:14:06bodyreadBoundedBody · 1 MB cap · ok
21:14:05cspnonce ok · script-src 'strict-dynamic'
21:14:02starseq 14 · serialized · optimistic ok
21:13:58admin401 · timing-safe · no length oracle
21:13:54audit[admin-content] event=inserted lang=sql n=30
21:14:08embedtext-embedding-3-small · 1536-dim · ~142ms
21:14:07hnswNN cosine 0.94 · flagged-dup soft-flag
21:14:07adminPOST /api/admin/content · HTTPS · rate-limit ok
21:14:06bodyreadBoundedBody · 1 MB cap · ok
21:14:05cspnonce ok · script-src 'strict-dynamic'
21:14:02starseq 14 · serialized · optimistic ok
21:13:58admin401 · timing-safe · no length oracle
21:13:54audit[admin-content] event=inserted lang=sql n=30
§ 04Decisions · trade-offs

Six deliberate choices.

Per decision: what was chosen, instead of what, and why.
D/01

Drizzle pg-core + Postgres + pgvector instead of Prisma + SQLite + a separate search system.

chosen
drizzle-orm/pg-core against Postgres 18 with pgvector; embeddings live on the same row as the snippet, queried with HNSW.
instead of
Prisma over SQLite for the app, plus a sidecar search service (Meilisearch / Typesense) for similarity
reason
pgvector dedup is one gate of the insert path, not a feature bolted on top — the moment dedup lives in a second system, every write becomes a two-phase commit with its own outbox and a backfill story for whenever the search index drifts. One Postgres, one source of truth. Drizzle gives me TS-native query building without the Prisma shadow-DB tax. SQLite was tempting for "simple solo SaaS," but it can't run pgvector — and the moment I admit I need embeddings, the choice closes itself.
D/02

Admin Content-Push API instead of SSH tunnel or a public Postgres port.

chosen
One POST /api/admin/content over HTTPS with a streaming bounded body, SHA-256 admin key checked via timingSafeEqual, and a dedicated rate-limit bucket.
instead of
A toggled-on Postgres port behind a firewall rule, or an SSH tunnel from the operator's Mac into the server
reason
A toggled port is forgotten on once and the box is owned. SSH tunnels rot — keys rotate, the tunnel script lives on the laptop that wasn't backed up, and the audit trail is whatever the shell history happens to keep. One HTTPS surface collapses all of that into a single, rate-limited, audit-logged route with a streaming body cap. The Mac never opens a network path to the database; it sends an authenticated request to the application, which validates, dedupes, and inserts through audit-logged code paths. Attack surface measured in routes, not ports.
D/03

In-memory token bucket behind a RateLimiter interface.

chosen
A single-process token bucket with right-most XFF handling behind RATE_LIMIT_TRUST_PROXY, exposed through a RateLimiter interface so the implementation is a one-file swap.
instead of
Upstash Redis or @upstash/ratelimit from day one as the only rate-limit backend
reason
On a single-replica Coolify deploy, in-memory is correct, not a workaround — there is no second process to coordinate with, the hot path stays allocation-free, and the cost of a rate-limit check is a map lookup. Putting it behind an interface keeps the Upstash door open: when (if) the app goes multi-replica, the swap is one file and a feature flag, not a refactor. The XFF handling is the part that bites you in production — right-most, not left-most, and only when RATE_LIMIT_TRUST_PROXY says the upstream is the edge we trust.
D/04

Per-request CSP nonce instead of unsafe-inline relaxation.

chosen
Next.js 16 proxy.ts mints a 16-byte base64 nonce per request, attaches it to x-nonce, and server components read it via headers(); every inline script — including the JSON-LD blocks — carries the nonce attribute.
instead of
Allowing 'unsafe-inline' in script-src so legacy inline tags and analytics snippets just work
reason
unsafe-inline is the policy that makes a CSP look green in the header and do nothing against XSS. A nonce-based policy with strict-dynamic is the only CSP that meaningfully constrains injected script. The Next.js 16 proxy path is the same shape as the one the docs ship with; the cost is exactly the discipline of remembering to thread the nonce through every new <script> — including JSON-LD, which is the place I forgot first and noticed in 30 seconds because the report-only run-up caught it.
D/05

runs table + capypad_run_id HttpOnly cookie instead of URL params or localStorage.

chosen
A runs row created by startRun, settled by finishRun, keyed by an HttpOnly cookie that the client cannot forge or read; auth state is a 4-case matrix the server resolves on every action.
instead of
Encoding run state in the URL, or trusting localStorage for the run identifier and progress
reason
URLs are shareable and bookmarkable — great for pages, fatal for run identity: anyone with a link can replay or claim. localStorage is readable by every script that the page ever loads. An HttpOnly cookie keyed to a server-owned runs row is the only form where the client cannot lie about who owns the run. The 4-case matrix (anon vs signed-in × claimed vs unclaimed) closes the runId-guess TOCTOU that a single boolean would leak. State machines don't get smaller than this; they get wronger.
D/06

LLM-as-Judge with a cross-vendor model instead of trusting the generator's own output.

chosen
After AI generation, a second CLI provider — never the one that generated — fact-checks every item: is the marked answer right, do the distractors actually fail, does the code do what the question claims. High-confidence failures are dropped pre-insert; the gate degrades to pass-through on timeout or unparseable output so a flaky judge never blocks the pipeline.
instead of
Trusting the generator model's output as-is, or self-reviewing with the same model that produced it
reason
A model grading its own work inherits its own blind spots — the misconception that produced a wrong answer waves it through on review. A different vendor as judge breaks that correlation. The retroactive sweep proved the stakes: ~10% of expert-level snippets in the live corpus carried factual bugs — missing imports, wrong instanceof claims, security antipatterns taught as patterns. It also proved the judge is only as good as the model behind it — one provider hit a 52% false-positive rate, confidently flagging correct code on stale framework knowledge. So the judge gates new inserts automatically, but a backlog flagged by the sweep gets a human (or better-model) review before anything is deleted. Defense in depth, never an auto-purge.
§ 05Highlights · interesting bits

Things that were not obvious.

Edge cases and details that only became clear during build.

HNSW for O(log n) nearest-neighbour search

H/01
The naive dedup is a sequential scan over (language, quizLanguage): pick a candidate snippet, cosine-compare against every existing row, throw a soft-flag past a threshold. At 60+60 rows per language that's fine. At 10× that, it's not.

Real: vector_cosine_ops HNSW indexes on embedding_code and embedding_text. The insert loop runs an approximate-NN query in milliseconds regardless of corpus size, the dedup decision stays a single query, and the index keeps log-scaling as the language fills up. The interesting part is that the index has to exist before the first row — building it later means the first N inserts dedup against nothing and silently land duplicates.

Streaming readBoundedBody

H/02
A bounded body sounds like a one-liner: read Content-Length, reject if too big. The header can lie — an attacker who declares 10 KB and streams 10 GB will exhaust memory before the parser notices.

Fix: stream the body, count bytes per chunk, call reader.cancel() the moment the cumulative size crosses MAX_BODY_BYTES. Memory stays bounded even when the header under-reports; the cap is on actual bytes consumed, not on the value the client claims. Same shape as the standard "reject early" pattern, but the reject point is the chunk loop, not the header parse.

Rate-limit before auth, not after

H/03
The instinct is to put auth first: validate the admin key, then rate-limit the authenticated caller. The instinct is wrong for a shared-secret endpoint. If auth runs first, every wrong key triggers the (cheap, but real) SHA + timingSafeEqual path, and there's no upper bound on how many guesses a bot can make before noticing.

Fix: a dedicated adminLimiter bucket at 10 req/min runs BEFORE verifyAdminKey. Brute-force attempts hit 429 long before they exercise the verifier. The auth path stays timing-safe; the bucket makes timing-safety load-bearing instead of decorative.

Optimistic star toggle, serialised commit

H/04
Stars feel instant — click, the icon flips, the request goes out. The naive version drops the request payload through a single fetch and trusts the database to settle. With fast clicks the responses arrive out of order and the DB row ends up in whatever state the slowest request landed last — last-write-wins, inverted from what the user did.

Fix: three refs — starRequestSeq, confirmedStarred, starInFlight — serialise each click off the previous settle. The optimistic UI flips immediately; the network commit waits for the prior request to confirm before issuing. Closes the inversion at the client without making the user wait, and without a server-side lock.

Generation gates don't protect the backlog

H/05
The judge, the topic-coverage steering, the byte-identical-code reject — all of them run at insert time. They protect every row written after they shipped, and nothing already in the table.

Fix: a retroactive audit that keyset-paginates the whole corpus through the same judge layer, crash-safe via an after-cursor, writing factual-bug verdicts to a judge_flag column. A 200-concept random sample read 5%; the expert-difficulty sweep read ~10%. The non-obvious part: the sweep needs its own write-back path — the generation gates reject, but a backlog audit has to flag in place so a human reviews before a delete. Insert-time and audit-time turned out to be two different problems wearing the same judge.
§ 06Stack · in production

What's running.

Working toolchain — nothing theoretical.
Next.js 16 · App RouterTypeScript 5Drizzle pg-corePostgres 18 + pgvectorNextAuth v5 · Drizzle adapterTailwind 4Gemini · Claude · Opencode · Codex CLILLM-as-Judge · cross-vendor gateOpenRouter · text-embedding-3-small · 1536dToken-bucket · RateLimiter ifaceCSP nonce · proxy.tsTrivy CI · image + fsCoolify · dedicated HetznerCloudflare · Traefik · push-to-deployDocker · SHA-pinned
§ 07Reflection · takeaways

What I learned.

Live as a SaaS at capypad.com. These are the things I'm taking forward.

Insert-path gates beat a cleanup cron — for new rows.

The reflex on bad content is a nightly cleanup job — scan the table, group by similarity, merge or delete. That job needs logic and a migration path and a story for live traffic during the sweep, and the duplicates are already in user-facing runs by the time it fires. With the four-gate insert path the pre-check is milliseconds; the cron job never gets written because the bad rows never land.

The honest caveat: insert gates only protect what's written after they ship. The pre-gate backlog still needed exactly one retroactive sweep — and that's a different tool. Insert-time rejects; audit-time flags in place for review. Same judge, two problems. Build the gate first; budget for the one sweep that catches up the history.

One HTTPS surface beats three network holes.

The Mac → Coolify push API isn't a comfort feature — it's a reduction in attack surface. SSH tunnels and toggled Postgres ports are both forgot-it-once-and-it's-dead: the tunnel that wasn't closed, the firewall rule that survived a server rebuild, the operator key that lives on the laptop that isn't backed up. A single rate-limited HTTPS route with a streaming body cap and a timing-safe auth check is fewer moving parts and a better audit trail — and it's the only ingest path that the rest of the security model gets to assume. Boring beats clever at the network edge.

◦ NEXT CASE · 11 / 11
cvmake
← all projects