skip to content

Search

Senior PM · InMobi Group 2026, present
Manifesto

Spec-Driven SDLC, A new paradigm for AI-first Agile Product teams

Why agile teams should treat specs as the source of truth for AI collaborators, and how I run this site as a working demo. Two live examples + a spec generator you can use right now.

Astro
Cursor
Claude
Gemini
Markdown specs
Vitest

Outcomes

12+Specs in this repodraft → approved → implemented
100%AI-implemented featuresof features in this site
2Live example slicesETA estimator + Comment moderation

Three pillars

01

Specs are the contract

Humans write intent. AI writes code. Specs are how the two negotiate without misunderstanding.

02

Evals are the acceptance

Acceptance criteria become test specs. 'Done' means the eval is green and the spec moved to implemented.

03

The roadmap is the backlog

ROADMAP.md is the living kanban. Every spec lands there before it's coded, and stays there after it ships.

The thesis

Agile worked when teams shipped what humans typed.

In 2026, the typing is happening somewhere else. A senior PM defines an outcome. An AI agent, Cursor, Claude Code, Codex, whatever’s hot this quarter, produces a working diff in fifteen minutes. The bottleneck is no longer how fast engineers can pattern-match. It’s how precisely the rest of us can describe what we want.

User stories don’t survive that handoff. “As a user, I want to filter restaurants by dietary preference, so that I can order without scrolling” is fine for a sprint planning meeting. It’s a disaster as input to an LLM that will write the filter, the index, the schema, the tests, and the analytics events all at once. The model fills in the gaps with whatever’s most common in its training data, not whatever’s right for your product.

The fix isn’t to give the AI a bigger story. The fix is to give it a spec.

A spec is what a story looks like once you’ve removed the ambiguity. It states the schema. It freezes the prompt. It encodes acceptance criteria as binary checks. It says explicitly what is not in scope. When you hand a good spec to a capable AI agent, it produces code that matches what you asked for. When you hand a vague story to the same agent, it produces code that matches what someone else’s product needed.

That difference compounds. Over a quarter, teams that ship from specs out-execute teams that ship from stories, not because their AI is better, but because their intent is better-encoded.

This essay is a working demo of that thesis. Three pillars, two product examples, and a spec generator you can use right now.

What lives behind the pillars

Specs are the contract. Look in specs/ of this repo. Every feature on this website, Ask AI, voice mode, the local-only WebLLM path, the project page you’re reading, has a spec. The specs are written first, reviewed (sometimes by me alone, sometimes with another LLM critic), and only then implemented. The implementation is unsurprising because the surprises were resolved in the spec.

Evals are the acceptance. A spec without acceptance criteria is a wish. Acceptance criteria written as English (“the page should be fast”) are also wishes. Acceptance criteria written as binary, machine-checkable assertions (“Lighthouse perf ≥ 90 on a Pixel 5 throttled 4G connection”) are contracts. The two example slices below take this further: the eval test files are direct translations of the spec’s acceptance criteria. If the eval passes, the spec is satisfied. If the eval fails, the next move is not to “fix the test”, it’s to revisit the spec.

The roadmap is the backlog. specs/ROADMAP.md is one file. Every feature, partial or done, has a row. Every row has a status emoji. Every row links to its spec. There is no Jira. There is no separate backlog. When the spec moves to approved, the roadmap row moves to 🔄. When the eval is green and the code is merged, both move to . The single source of truth eliminates the “wait, what’s the actual state of this?” round-trips that eat senior PM time.

Demo 1, Swiggy-style ETA estimator

The first example is intentionally non-AI code, a pure function, deterministic, two hundred lines including types. The point is to show that spec-driven SDLC isn’t only useful when an LLM is in the loop at runtime. It’s useful any time the rules of the system are non-obvious and the consequences of getting them wrong are real.

The spec lives at examples/spec-driven-swiggy-eta/specs/feature_eta_spec.md. It freezes:

  1. The formula (prep × load factor × item factor + travel + traffic buffer + handoff)
  2. The output shape (etaMinutes, plus a transparent breakdown)
  3. The acceptance threshold (≥9 of 10 fixtures within ±3 minutes; median absolute error ≤2 minutes)

The eval at evals/eta.test.ts is a one-to-one translation of those acceptance criteria. The fixture set (fixtures/orders.json) is hand-built across four traffic profiles, quick combo, family meal, slammed kitchen, long ride, so the eval exercises real-world edge cases, not just the happy path.

The widget below is the same code path the eval runs against. The function imported into the React island is the function that the spec governs. There is no separate “demo formula”.

Live: Swiggy ETA

The pure function from examples/spec-driven-swiggy-eta/src/eta.ts, running here in your browser.

21min

Computed by the spec's formula:

Prep
12.3m
Travel
5.9m
Handoff
2m

Drag the sliders. Notice that Slammed kitchen (load factor 1.6) more than doubles the prep time, that’s the formula working. Notice that doubling distance with a constant rider speed produces a near-linear travel increase, that’s also the formula. The handoff stays a flat two minutes because the spec says it does.

If a future contributor decides handoff should be three minutes? They have to update feature_eta_spec.md first. Then the test fails until the fixtures or the threshold are reconciled. Then we have a real conversation about whether three minutes is the right call. The spec is the forcing function.

Demo 2, YouTube-style comment moderation

The second example is the harder problem: the rules of the system live partly in an LLM. How do you spec that? How do you test it?

The answer in spec-driven-youtube-mod/ is to push as much determinism as possible into the spec, leaving the LLM with only the work it’s actually good at, fuzzy classification.

Three things move out of the model and into the spec:

  • The prompt. Frozen by feature_moderate_spec.md. Changing the prompt is a spec version bump, not a one-line code change. The prompt template lives in src/prompt.ts and is built by a pure function, easy to inspect, easy to diff in a PR.
  • The output schema. A Verdict is { label, confidence, reason, rulesTriggered[] }, validated by Zod at the boundary. If the LLM returns 1.7 for confidence, the wrapper clamps it to 1. If it cites a rule that doesn’t exist in the policy, the wrapper drops it. Hallucinations bounce off the schema; they don’t propagate into the rest of the product.
  • The eval thresholds. eval_moderate_spec.md demands precision ≥ 0.8 and recall ≥ 0.7 against a 20-comment hand-labelled fixture set. The CI eval uses a deterministic mock LLM, intentionally ~85% accurate, so the thresholds matter and can’t be gamed. Real-LLM evaluation happens manually, off the critical path, before the spec moves to implemented.

What’s left for the LLM? The classification itself. Is this comment toxic, spam, or safe? That’s the actual hard part, the rest is mechanism.

The widget below runs against your active LLM. Pick a sample, click classify, watch a real verdict come back, labelled, with a reason, with the rules that triggered.

Live: YouTube Comment Moderation

Real call to your active LLM, governed by examples/spec-driven-youtube-mod/specs/feature_moderate_spec.md.

Active provider: read from your Ask AI settings.

This is what spec-driven SDLC for LLM-shaped code looks like in practice. The spec governs everything except the model’s judgement, and the model’s judgement is the only thing left to evaluate.

Try it yourself

The pillars are easy to nod at. The hard part is producing a spec for your idea on a Wednesday afternoon when your sprint review is in twenty minutes. So here’s a generator.

Tell it what you want to build. Tell it which kind of spec you need. It drafts the rest in this repo’s house style, frontmatter, sections, Zod types, binary acceptance criteria, an explicit out-of-scope list, open questions. You take what’s useful, you fill the <FILL> placeholders, and you walk into your sprint review with a spec instead of a story.

It uses the LLM you’ve already configured for Ask AI on this site. No new keys. No new accounts.

Try it yourself

Spec Generator

Describe a feature you want to build. The generator drafts a spec in this repo's house style — frontmatter, sections, Zod types, acceptance criteria. Uses your active LLM (Gemini / Anthropic / Local).

Copy the output. Drop it into your repo’s specs/ folder. Open a draft PR. Hand it to whichever AI coding agent you’re using this quarter. Ship the diff. Move the row.

That’s the loop.

Anti-patterns to watch for

  • Vibes coding. “I’ll just describe it to Claude and iterate.” The first iteration looks fine. The third iteration is a mess of accumulated drift because nothing is the source of truth except the most recent prompt. Symptoms: PR descriptions that say “implements the thing we discussed in Slack.”
  • Prompt as spec. Treating a single LLM prompt as if it were a spec. A spec has frontmatter, section structure, acceptance criteria, scope, and open questions. A prompt has none of that, and an LLM will happily write whatever fills the silence.
  • AI as stenographer. Letting the agent dictate the spec back to you. The spec is the place where human judgement about scope, trade-offs, and acceptance lives. The AI implements; the human specifies.
  • “We’ll write the spec after.” This means there is no spec. The post-hoc document is a description, not a contract.
  • Acceptance criteria written as English. “Should be reasonably fast.” “Should look polished.” Every adjective is a future argument.

What’s next

A few things I’m working on, all of which will land as their own specs in specs/projects/ and rows on the roadmap:

  1. An eval harness that runs on every PR. Today, examples/*/evals/*.test.ts runs on pnpm test. The next step is to surface eval deltas in PR comments, so a regression in precision is as visible as a regression in latency.
  2. Spec Generator v2 with RAG over your own corpus. Today the generator is zero-shot with a single style anchor. v2 will retrieve from your existing specs/ directory so the generator’s output reads like your specs, not generic ones.
  3. Auto-PR-from-spec. When a spec moves to approved in main, a PR opens automatically with the agent’s first-draft implementation. Spec → PR, no human key-presses.
  4. Two more example slices. Swiggy-style restaurant search (with a live UI) and YouTube-style “Up Next” recommendations are queued up as examples/spec-driven-swiggy-search/ and examples/spec-driven-youtube-recs/. Both will follow the same spec-first pattern. Both will be linked here when they land.

If any of this resonates and you want to compare notes, or push back hard on something, find me at the Ask AI page or via email. I read everything.

Specs & docs from the repo

Rendered straight from demo.highlights. Each document is the source of truth in the repo — the snippets below stay in sync at build time.

Feature Eta Spec

examples/spec-driven-swiggy-eta/specs/feature_eta_spec.md
View on GitHub

computeEta — Order ETA Estimator

Purpose

Given a parsed Order, compute the estimated time-to-doorstep in whole minutes plus a transparent breakdown the UI can render. This function is the unit of work the spec controls. It is pure, synchronous, and has no I/O, so the test fixtures fully define correctness.

Inputs

Name Type Required Description Example
order Order (see schema_order_spec.md) YES Validated order fixture ORD-001

Outputs

Field Type Description
etaMinutes number (integer) Total ETA, ceiling of components
breakdown.prep number (1 dp) Prep minutes incl. load factor
breakdown.travel number (1 dp) Travel minutes incl. traffic buffer
breakdown.handoff number (1 dp) Fixed 2-minute handoff

Formula

prepMinutes    = restaurant.avgPrepMinutes * (1 + restaurant.currentLoadFactor) * itemCountFactor
                 where itemCountFactor = 1 + 0.1 * max(0, itemCount - 1)
travelMinutes  = (distanceKm / rider.avgKmh) * 60 + rider.trafficBufferMinutes
handoff        = 2.0
etaMinutes     = ceil(prepMinutes + travelMinutes + handoff)

Rationale for itemCountFactor: the first item costs full prep; each additional item adds 10% (parallel cooking, not serial). 2 minutes handoff is a fixed buffer for "rider arrives, hands over, customer accepts" — derived from Swiggy public engineering posts and not contentious for v1.

State Model

No client-side state — pure function.

Acceptance Criteria

  • computeEta(order) returns an integer etaMinutes >= 1 for every valid order
  • breakdown.prep + breakdown.travel + breakdown.handoff rounded up equals etaMinutes
  • The function throws ZodError for inputs that fail OrderSchema.parse
  • >= 9 of 10 fixture orders predict within ±3 minutes of the labelled ground-truth ETA (see eval_eta_spec.md)
  • Bundle size of eta.ts (excluding Zod) is < 1 KB minified

Out of Scope

  • Probabilistic ETAs / confidence intervals (v2)
  • Multi-leg routing or batched orders
  • Dynamic re-estimation as the rider moves
  • Calling external services

Open Questions

  • Should itemCountFactor decay at scale (e.g., 20 items)? Defer until we have real data.

Feature Moderate Spec

examples/spec-driven-youtube-mod/specs/feature_moderate_spec.md
View on GitHub

moderate() — LLM-driven Comment Classifier

Purpose

Given a comment and a policy, produce a Verdict (safe / spam / toxic) with a confidence score and a citation of which policy rules triggered. The function is LLM-callable: the caller passes a (prompt: string) => Promise<string> so the same code runs in CI against a deterministic mock and in production against Gemini / Anthropic / WebLLM.

This is the spec-driven SDLC's answer to "how do you test code that calls an LLM?". The answer is: the prompt is part of the spec, the expected output schema is part of the spec, the eval thresholds are part of the spec — and the LLM itself is replaceable.

Inputs

Name Type Required Description
comment Comment (see schema_comment_spec.md) YES Validated comment
policy Policy YES Frozen policy text + rules
llm (prompt: string) => Promise<string> YES Caller-supplied completion fn

Outputs

Verdict (validated via VerdictSchema).

Prompt Contract

The prompt is constructed from the constants in src/prompt.ts. It is frozen by this spec — changes require a spec version bump.

You are a content moderator. Output a single JSON object matching this schema:
{ "label": "safe" | "spam" | "toxic", "confidence": number 0..1, "reason": string, "rulesTriggered": string[] }

Policy rules (cite by exact string):
{policy.rules joined with newlines}

Author signals (use as priors, do not over-weight):
- account age: {comment.authorAgeDays} days
- prior violations: {comment.authorPriorViolations}

Comment to classify:
"""
{comment.text}
"""

Output JSON only. No prose.

Acceptance Criteria

  • On 20-comment fixture set: precision ≥ 0.8 across the toxic and spam labels (combined positives)
  • On 20-comment fixture set: recall ≥ 0.7 across the toxic and spam labels
  • All produced Verdicts pass VerdictSchema.parse
  • Function throws ZodError if LLM returns malformed JSON
  • Function caps confidence to [0, 1] defensively even if LLM returns 1.5

Out of Scope

  • Streaming output — moderation is one-shot
  • Multi-rule weighting / scoring — flat triggered list
  • Human-in-the-loop appeals
  • Logging / persistence

Open Questions

  • Should we add a language field to fail-fast on non-English? Defer until eval shows recall drop.

Feature Spec

specs/templates/feature_spec.md
View on GitHub

<FILL: Feature Name>

Purpose

<FILL: 2-4 sentences. What user problem does this feature solve? Who is the audience and what does "done" look like?>

Pages / Routes

Route Astro file Description
<FILL> src/pages/<FILL>.astro

User Interactions

Trigger Behaviour Notes
<FILL: e.g. "User clicks tag badge"> <FILL: e.g. "Navigates to /tags/[tag]">

State Model

<FILL: Describe any client-side state managed by React hooks or Astro view transitions. If the feature is purely static, write "No client-side state — static Astro page.">

State field Type Initial value Description

Layout & Components

Component File Role
src/components/<FILL>

Acceptance Criteria

  • <FILL: specific, binary, verifiable check>
  • Page renders with no hydration errors in the browser console
  • All links resolve to valid routes (no 404s)
  • Page is included in sitemap.xml
  • OG image is generated correctly for social sharing

Out of Scope

  • <FILL: list what this feature does NOT include>

Open Questions

  • <FILL: any unresolved decisions>

ROADMAP

specs/ROADMAP.md
View on GitHub

Development Roadmap — astro-citrus

Central tracker for all features and their implementation status. Update this file whenever a spec moves to a new status or a new feature is planned.

Status Key

Symbol Status Meaning
implemented Live on the site, spec (if any) matches current code
⚠️ partial Code exists but incomplete, buggy, or needs polish
🔄 in-progress Actively being built
📋 planned Decided but not started
💡 idea Under consideration, not committed

Feature Tracker

# Feature Status Spec Notes
1 Home page home_redesign_spec.md Revamped hero with embedded Ask AI launchpad, metrics strip, featured series, post + note grids
2 About page home_redesign_spec.md Two-column "What I do / How I work" grids; inline Ask AI CTA; placeholder Instagram removed
3 Blog (posts) 37 posts, pagination, tags, OG images
4 Field Notes 30 notes, 10 flagged ai: true for the index
5 Learn / Series Curated learning tracks with ordered posts
6 Resume page home_redesign_spec.md Sticky right-rail TOC on lg+, Download PDF slot, inline Ask AI CTA
7 Tags index /tags/ + per-tag paginated listing
8 Full-text search (Pagefind) Production only; / shortcut to open
9 OG image generation Satori-based per-page social images
10 RSS feed /rss.xml
11 Sitemap Auto-generated by @astrojs/sitemap
12 Analytics Google Tag Manager + PostHog
13 Ask AI — Gemini RAG ask_ai_spec.md Fully working with PUBLIC_GEMINI_API_KEY
14 Ask AI — Anthropic provider ⚠️ ask_ai_spec.md Code & UI ready; add PUBLIC_ANTHROPIC_API_KEY to .env and astro.config.ts
15 Ask AI — WebLLM (local) ⚠️ ask_ai_spec.md UI ready; user must opt-in to ~1.5–4.6 GB model download
16 Ask AI — Discoverability home_redesign_spec.md Nav link with sparkle, hero launchpad with ?q= deep-link, inline CTAs on About/Resume
17 Reading time in content schema ⚠️ remark-reading-time plugin exists; add readingTime to Zod schema in content.config.ts and Masthead.astro
18 Background gradient performance home_redesign_spec.md Replaced ~24 absolute blurred divs with single .site-mesh radial gradient at body level
19 Webmentions 📋 Env schema ready (WEBMENTION_API_KEY, WEBMENTION_URL); integration not wired into components
20 Ask AI — Readiness & errors Provider readiness chip, queued questions, recovery card, local-model onboarding, 7-day auto-eviction with day-6 warning
21 Voice mode — Gemini Live Bidirectional realtime audio (WebSocket), native barge-in, RAG via searchSiteContent tool calls, streaming captions, sessionResumption, sliding-window context compression, goAway auto-reconnect
22 Voice mode — Local engine Fully on-device: Whisper-base.en STT + Silero VAD + Kokoro-82M TTS + WebLLM, sentence-streamed playback, RMS-based barge-in, voice-pack onboarding with 7-day eviction shared with chat path
23 Projects module — v1 projects/spec-driven-sdlc.md New project collection + /projects/ index + detail layout. Inaugural piece: Spec-Driven SDLC, with two embedded live example slices (Swiggy ETA + YouTube Comment Mod) and a Spec Generator. PostHog telemetry + featured strip on home + nav link. Pagefind-indexed, RAG-indexed (11 chunks).
24 Spec Generator components/spec_generator_spec.md Visitors describe an idea, choose a spec type, and stream a full draft spec in this repo's house style via the active LLM (localStorage("ai_llm_provider")). Copy / Download .md / Stop / Regenerate toolbar. NoProviderError recovery card.
25 Spec-driven examples (v2) 📋 Add Swiggy Search + YouTube Recs to examples/ as the methodology library grows
26 LLM streaming for Gemini & Anthropic src/lib/llm-complete.ts wraps streamGenerateContent (Gemini SSE) + messages?stream (Anthropic SSE) + WebLLM passthrough. Typed NoProviderError. Used by SpecGenerator + ModWidget.
27 Cookie Consent & GDPR Analytics cookie-consent-spec.md First-visit consent banner; PostHog full vs. privacy mode based on localStorage; /privacy page
28 Academy — Learning Hub + Enrolment 💡 academy-spec.md Rebrand Learn → Academy; track grid; email enrolment (Resend); localStorage progress tracking; 7 open questions pending
29 Voice UX Polish — copy & icon voice-ux-polish-spec.md "Chat with my work" → "Talk with my work"; MicIcon → AudioWaveIcon (waveform bars) across HeroAskAI, AIAssistant, VoiceMode
30 GitHub profile — README & fields 📋 github-profile-spec.md Create hoshank/hoshank README, fill bio/location/company/website, repin original repos
31 Product Teardowns 📋 teardowns-spec.md New teardown + product collections; /teardowns/[product]/[slug]/; deep-first; home strip Day 1; dedicated RSS; YouTube, Claude, Cursor, Instagram at launch

Backlog

Add new ideas here before writing a spec. Move to the tracker above once a spec is created and status is confirmed.

Idea Notes
Dark / light mode per-page persistence Currently uses system preference
Newsletter / email capture Possible integration with ConvertKit or Buttondown
Guestbook / comments Static comments via Utterances or Giscus

Spec Index

All specs live in specs/. Use this index to find or create a spec for any feature.

Spec file Type Status Feature
examples/ask_ai_spec.md ai-feature implemented Ask AI — RAG chat assistant
home_redesign_spec.md feature implemented Home & site revamp + Ask AI discoverability
projects/spec-driven-sdlc.md project implemented Spec-Driven SDLC project page (#23)
components/spec_generator_spec.md component implemented Spec Generator React island (#24)
cookie-consent-spec.md feature implemented Cookie Consent & GDPR Analytics (#27)
academy-spec.md feature draft Academy — Learning Hub + Enrolment (#28)
teardowns-spec.md feature approved Product Teardowns (#31)
voice-ux-polish-spec.md feature implemented Voice UX Polish — copy & icon (#29)
github-profile-spec.md feature planned GitHub Profile — README & fields (#30)

Templates (copy to create a new spec)

Template Use for
templates/content_schema_spec.md New Astro content collections
templates/ai_feature_spec.md AI / RAG / LLM features
templates/feature_spec.md Pages, routes, site sections
templates/component_spec.md Astro or React components
templates/project_spec.md Project / portfolio entries

How to Update This File

  1. New feature planned — add a row to the Backlog table.
  2. Spec created — move from Backlog to Feature Tracker, add spec link, set status 📋 planned.
  3. Work starts — change status to 🔄 in-progress.
  4. Feature ships — change status to ✅ implemented (or ⚠️ partial if incomplete). Update spec frontmatter status: to match.
  5. Spec superseded — mark old spec deprecated in its frontmatter; add a note in the Spec Index row.

Related reading