Senior PM · InMobi Group 2026, present

Manifesto

Spec-Driven SDLC, A new paradigm for AI-first Agile Product teams

Why agile teams should treat specs as the source of truth for AI collaborators, and how I run this site as a working demo. Two live examples + a spec generator you can use right now.

Astro

Cursor

Claude

Gemini

Markdown specs

Vitest

Try the live demos View on GitHub

1 May 2026

Outcomes

12+Specs in this repodraft → approved → implemented

100%AI-implemented featuresof features in this site

2Live example slicesETA estimator + Comment moderation

Three pillars

Specs are the contract

Humans write intent. AI writes code. Specs are how the two negotiate without misunderstanding.

Evals are the acceptance

Acceptance criteria become test specs. 'Done' means the eval is green and the spec moved to implemented.

The roadmap is the backlog

ROADMAP.md is the living kanban. Every spec lands there before it's coded, and stays there after it ships.

The thesis

Agile worked when teams shipped what humans typed.

In 2026, the typing is happening somewhere else. A senior PM defines an outcome. An AI agent, Cursor, Claude Code, Codex, whatever’s hot this quarter, produces a working diff in fifteen minutes. The bottleneck is no longer how fast engineers can pattern-match. It’s how precisely the rest of us can describe what we want.

User stories don’t survive that handoff. “As a user, I want to filter restaurants by dietary preference, so that I can order without scrolling” is fine for a sprint planning meeting. It’s a disaster as input to an LLM that will write the filter, the index, the schema, the tests, and the analytics events all at once. The model fills in the gaps with whatever’s most common in its training data, not whatever’s right for your product.

The fix isn’t to give the AI a bigger story. The fix is to give it a spec.

A spec is what a story looks like once you’ve removed the ambiguity. It states the schema. It freezes the prompt. It encodes acceptance criteria as binary checks. It says explicitly what is not in scope. When you hand a good spec to a capable AI agent, it produces code that matches what you asked for. When you hand a vague story to the same agent, it produces code that matches what someone else’s product needed.

That difference compounds. Over a quarter, teams that ship from specs out-execute teams that ship from stories, not because their AI is better, but because their intent is better-encoded.

This essay is a working demo of that thesis. Three pillars, two product examples, and a spec generator you can use right now.

What lives behind the pillars

Specs are the contract. Look in specs/ of this repo. Every feature on this website, Ask AI, voice mode, the local-only WebLLM path, the project page you’re reading, has a spec. The specs are written first, reviewed (sometimes by me alone, sometimes with another LLM critic), and only then implemented. The implementation is unsurprising because the surprises were resolved in the spec.

Evals are the acceptance. A spec without acceptance criteria is a wish. Acceptance criteria written as English (“the page should be fast”) are also wishes. Acceptance criteria written as binary, machine-checkable assertions (“Lighthouse perf ≥ 90 on a Pixel 5 throttled 4G connection”) are contracts. The two example slices below take this further: the eval test files are direct translations of the spec’s acceptance criteria. If the eval passes, the spec is satisfied. If the eval fails, the next move is not to “fix the test”, it’s to revisit the spec.

The roadmap is the backlog. specs/ROADMAP.md is one file. Every feature, partial or done, has a row. Every row has a status emoji. Every row links to its spec. There is no Jira. There is no separate backlog. When the spec moves to approved, the roadmap row moves to 🔄. When the eval is green and the code is merged, both move to ✅. The single source of truth eliminates the “wait, what’s the actual state of this?” round-trips that eat senior PM time.

Demo 1, Swiggy-style ETA estimator

The first example is intentionally non-AI code, a pure function, deterministic, two hundred lines including types. The point is to show that spec-driven SDLC isn’t only useful when an LLM is in the loop at runtime. It’s useful any time the rules of the system are non-obvious and the consequences of getting them wrong are real.

The spec lives at examples/spec-driven-swiggy-eta/specs/feature_eta_spec.md. It freezes:

The formula (prep × load factor × item factor + travel + traffic buffer + handoff)
The output shape (etaMinutes, plus a transparent breakdown)
The acceptance threshold (≥9 of 10 fixtures within ±3 minutes; median absolute error ≤2 minutes)

The eval at evals/eta.test.ts is a one-to-one translation of those acceptance criteria. The fixture set (fixtures/orders.json) is hand-built across four traffic profiles, quick combo, family meal, slammed kitchen, long ride, so the eval exercises real-world edge cases, not just the happy path.

The widget below is the same code path the eval runs against. The function imported into the React island is the function that the spec governs. There is no separate “demo formula”.

Live: Swiggy ETA

The pure function from examples/spec-driven-swiggy-eta/src/eta.ts, running here in your browser.

Distance: 1.8 km

Items: 2

Kitchen load: 0.40

Traffic buffer: 1 min

21min

Computed by the spec's formula:

Prep: 12.3m
Travel: 5.9m
Handoff: 2m

Drag the sliders. Notice that Slammed kitchen (load factor 1.6) more than doubles the prep time, that’s the formula working. Notice that doubling distance with a constant rider speed produces a near-linear travel increase, that’s also the formula. The handoff stays a flat two minutes because the spec says it does.

If a future contributor decides handoff should be three minutes? They have to update feature_eta_spec.md first. Then the test fails until the fixtures or the threshold are reconciled. Then we have a real conversation about whether three minutes is the right call. The spec is the forcing function.

Demo 2, YouTube-style comment moderation

The second example is the harder problem: the rules of the system live partly in an LLM. How do you spec that? How do you test it?

The answer in spec-driven-youtube-mod/ is to push as much determinism as possible into the spec, leaving the LLM with only the work it’s actually good at, fuzzy classification.

Three things move out of the model and into the spec:

The prompt. Frozen by feature_moderate_spec.md. Changing the prompt is a spec version bump, not a one-line code change. The prompt template lives in src/prompt.ts and is built by a pure function, easy to inspect, easy to diff in a PR.
The output schema. A Verdict is { label, confidence, reason, rulesTriggered[] }, validated by Zod at the boundary. If the LLM returns 1.7 for confidence, the wrapper clamps it to 1. If it cites a rule that doesn’t exist in the policy, the wrapper drops it. Hallucinations bounce off the schema; they don’t propagate into the rest of the product.
The eval thresholds. eval_moderate_spec.md demands precision ≥ 0.8 and recall ≥ 0.7 against a 20-comment hand-labelled fixture set. The CI eval uses a deterministic mock LLM, intentionally ~85% accurate, so the thresholds matter and can’t be gamed. Real-LLM evaluation happens manually, off the critical path, before the spec moves to implemented.

What’s left for the LLM? The classification itself. Is this comment toxic, spam, or safe? That’s the actual hard part, the rest is mechanism.

The widget below runs against your active LLM. Pick a sample, click classify, watch a real verdict come back, labelled, with a reason, with the rules that triggered.

Live: YouTube Comment Moderation

Real call to your active LLM, governed by examples/spec-driven-youtube-mod/specs/feature_moderate_spec.md.

Active provider: read from your Ask AI settings.

This is what spec-driven SDLC for LLM-shaped code looks like in practice. The spec governs everything except the model’s judgement, and the model’s judgement is the only thing left to evaluate.

Try it yourself

The pillars are easy to nod at. The hard part is producing a spec for your idea on a Wednesday afternoon when your sprint review is in twenty minutes. So here’s a generator.

Tell it what you want to build. Tell it which kind of spec you need. It drafts the rest in this repo’s house style, frontmatter, sections, Zod types, binary acceptance criteria, an explicit out-of-scope list, open questions. You take what’s useful, you fill the <FILL> placeholders, and you walk into your sprint review with a spec instead of a story.

It uses the LLM you’ve already configured for Ask AI on this site. No new keys. No new accounts.

Try it yourself

Spec Generator

Describe a feature you want to build. The generator drafts a spec in this repo's house style — frontmatter, sections, Zod types, acceptance criteria. Uses your active LLM (Gemini / Anthropic / Local).

Spec typeYour idea

Copy the output. Drop it into your repo’s specs/ folder. Open a draft PR. Hand it to whichever AI coding agent you’re using this quarter. Ship the diff. Move the row.

That’s the loop.

Anti-patterns to watch for

Vibes coding. “I’ll just describe it to Claude and iterate.” The first iteration looks fine. The third iteration is a mess of accumulated drift because nothing is the source of truth except the most recent prompt. Symptoms: PR descriptions that say “implements the thing we discussed in Slack.”
Prompt as spec. Treating a single LLM prompt as if it were a spec. A spec has frontmatter, section structure, acceptance criteria, scope, and open questions. A prompt has none of that, and an LLM will happily write whatever fills the silence.
AI as stenographer. Letting the agent dictate the spec back to you. The spec is the place where human judgement about scope, trade-offs, and acceptance lives. The AI implements; the human specifies.
“We’ll write the spec after.” This means there is no spec. The post-hoc document is a description, not a contract.
Acceptance criteria written as English. “Should be reasonably fast.” “Should look polished.” Every adjective is a future argument.

What’s next

A few things I’m working on, all of which will land as their own specs in specs/projects/ and rows on the roadmap:

An eval harness that runs on every PR. Today, examples/*/evals/*.test.ts runs on pnpm test. The next step is to surface eval deltas in PR comments, so a regression in precision is as visible as a regression in latency.
Spec Generator v2 with RAG over your own corpus. Today the generator is zero-shot with a single style anchor. v2 will retrieve from your existing specs/ directory so the generator’s output reads like your specs, not generic ones.
Auto-PR-from-spec. When a spec moves to approved in main, a PR opens automatically with the agent’s first-draft implementation. Spec → PR, no human key-presses.
Two more example slices. Swiggy-style restaurant search (with a live UI) and YouTube-style “Up Next” recommendations are queued up as examples/spec-driven-swiggy-search/ and examples/spec-driven-youtube-recs/. Both will follow the same spec-first pattern. Both will be linked here when they land.

If any of this resonates and you want to compare notes, or push back hard on something, find me at the Ask AI page or via email. I read everything.

Specs & docs from the repo

Rendered straight from demo.highlights. Each document is the source of truth in the repo — the snippets below stay in sync at build time.

Feature Eta Spec

examples/spec-driven-swiggy-eta/specs/feature_eta_spec.md

View on GitHub

computeEta — Order ETA Estimator

Purpose

Given a parsed Order, compute the estimated time-to-doorstep in whole minutes plus a transparent breakdown the UI can render. This function is the unit of work the spec controls. It is pure, synchronous, and has no I/O, so the test fixtures fully define correctness.

Inputs

Name	Type	Required	Description	Example
`order`	`Order` (see schema_order_spec.md)	YES	Validated order	fixture `ORD-001`

Outputs

Field	Type	Description
`etaMinutes`	`number` (integer)	Total ETA, ceiling of components
`breakdown.prep`	`number` (1 dp)	Prep minutes incl. load factor
`breakdown.travel`	`number` (1 dp)	Travel minutes incl. traffic buffer
`breakdown.handoff`	`number` (1 dp)	Fixed 2-minute handoff

Formula

prepMinutes    = restaurant.avgPrepMinutes * (1 + restaurant.currentLoadFactor) * itemCountFactor
                 where itemCountFactor = 1 + 0.1 * max(0, itemCount - 1)
travelMinutes  = (distanceKm / rider.avgKmh) * 60 + rider.trafficBufferMinutes
handoff        = 2.0
etaMinutes     = ceil(prepMinutes + travelMinutes + handoff)

Rationale for itemCountFactor: the first item costs full prep; each additional item adds 10% (parallel cooking, not serial). 2 minutes handoff is a fixed buffer for "rider arrives, hands over, customer accepts" — derived from Swiggy public engineering posts and not contentious for v1.

State Model

No client-side state — pure function.

Acceptance Criteria

computeEta(order) returns an integer etaMinutes >= 1 for every valid order
breakdown.prep + breakdown.travel + breakdown.handoff rounded up equals etaMinutes
The function throws ZodError for inputs that fail OrderSchema.parse
>= 9 of 10 fixture orders predict within ±3 minutes of the labelled ground-truth ETA (see eval_eta_spec.md)
Bundle size of eta.ts (excluding Zod) is < 1 KB minified

Out of Scope

Probabilistic ETAs / confidence intervals (v2)
Multi-leg routing or batched orders
Dynamic re-estimation as the rider moves
Calling external services

Open Questions

Should itemCountFactor decay at scale (e.g., 20 items)? Defer until we have real data.

Feature Moderate Spec

examples/spec-driven-youtube-mod/specs/feature_moderate_spec.md

View on GitHub

moderate() — LLM-driven Comment Classifier

Purpose

Given a comment and a policy, produce a Verdict (safe / spam / toxic) with a confidence score and a citation of which policy rules triggered. The function is LLM-callable: the caller passes a (prompt: string) => Promise<string> so the same code runs in CI against a deterministic mock and in production against Gemini / Anthropic / WebLLM.

This is the spec-driven SDLC's answer to "how do you test code that calls an LLM?". The answer is: the prompt is part of the spec, the expected output schema is part of the spec, the eval thresholds are part of the spec — and the LLM itself is replaceable.

Inputs

Name	Type	Required	Description
`comment`	`Comment` (see schema_comment_spec.md)	YES	Validated comment
`policy`	`Policy`	YES	Frozen policy text + rules
`llm`	`(prompt: string) => Promise<string>`	YES	Caller-supplied completion fn

Outputs

Verdict (validated via VerdictSchema).

Prompt Contract

The prompt is constructed from the constants in src/prompt.ts. It is frozen by this spec — changes require a spec version bump.

You are a content moderator. Output a single JSON object matching this schema:
{ "label": "safe" | "spam" | "toxic", "confidence": number 0..1, "reason": string, "rulesTriggered": string[] }

Policy rules (cite by exact string):
{policy.rules joined with newlines}

Author signals (use as priors, do not over-weight):
- account age: {comment.authorAgeDays} days
- prior violations: {comment.authorPriorViolations}

Comment to classify:
"""
{comment.text}
"""

Output JSON only. No prose.

Acceptance Criteria

On 20-comment fixture set: precision ≥ 0.8 across the toxic and spam labels (combined positives)
On 20-comment fixture set: recall ≥ 0.7 across the toxic and spam labels
All produced Verdicts pass VerdictSchema.parse
Function throws ZodError if LLM returns malformed JSON
Function caps confidence to [0, 1] defensively even if LLM returns 1.5

Out of Scope

Streaming output — moderation is one-shot
Multi-rule weighting / scoring — flat triggered list
Human-in-the-loop appeals
Logging / persistence

Open Questions

Should we add a language field to fail-fast on non-English? Defer until eval shows recall drop.

Feature Spec

specs/templates/feature_spec.md

View on GitHub

<FILL: Feature Name>

Purpose

<FILL: 2-4 sentences. What user problem does this feature solve? Who is the audience and what does "done" look like?>

Pages / Routes

Route	Astro file	Description
`<FILL>`	`src/pages/<FILL>.astro`

User Interactions

Trigger	Behaviour	Notes
<FILL: e.g. "User clicks tag badge">	<FILL: e.g. "Navigates to /tags/[tag]">

State Model

<FILL: Describe any client-side state managed by React hooks or Astro view transitions. If the feature is purely static, write "No client-side state — static Astro page.">

State field	Type	Initial value	Description

Layout & Components

Component	File	Role
	`src/components/<FILL>`

Acceptance Criteria

<FILL: specific, binary, verifiable check>
Page renders with no hydration errors in the browser console
All links resolve to valid routes (no 404s)
Page is included in sitemap.xml
OG image is generated correctly for social sharing

Out of Scope

<FILL: list what this feature does NOT include>

Open Questions

<FILL: any unresolved decisions>

ROADMAP

specs/ROADMAP.md

View on GitHub

Development Roadmap — astro-citrus

Central tracker for all features and their implementation status. Update this file whenever a spec moves to a new status or a new feature is planned.

Status Key

Symbol	Status	Meaning
✅	`implemented`	Live on the site, spec (if any) matches current code
⚠️	`partial`	Code exists but incomplete, buggy, or needs polish
🔄	`in-progress`	Actively being built
📋	`planned`	Decided but not started
💡	`idea`	Under consideration, not committed

Feature Tracker

#	Feature	Status	Spec	Notes
1	Home page	✅	home_redesign_spec.md	Revamped hero with embedded Ask AI launchpad, metrics strip, featured series, post + note grids
2	About page	✅	home_redesign_spec.md	Two-column "What I do / How I work" grids; inline Ask AI CTA; placeholder Instagram removed
3	Blog (posts)	✅	—	37 posts, pagination, tags, OG images
4	Field Notes	✅	—	30 notes, 10 flagged `ai: true` for the index
5	Learn / Series	✅	—	Curated learning tracks with ordered posts
6	Resume page	✅	home_redesign_spec.md	Sticky right-rail TOC on `lg+`, Download PDF slot, inline Ask AI CTA
7	Tags index	✅	—	`/tags/` + per-tag paginated listing
8	Full-text search (Pagefind)	✅	—	Production only; `/` shortcut to open
9	OG image generation	✅	—	Satori-based per-page social images
10	RSS feed	✅	—	`/rss.xml`
11	Sitemap	✅	—	Auto-generated by `@astrojs/sitemap`
12	Analytics	✅	—	Google Tag Manager + PostHog
13	Ask AI — Gemini RAG	✅	ask_ai_spec.md	Fully working with `PUBLIC_GEMINI_API_KEY`
14	Ask AI — Anthropic provider	⚠️	ask_ai_spec.md	Code & UI ready; add `PUBLIC_ANTHROPIC_API_KEY` to `.env` and `astro.config.ts`
15	Ask AI — WebLLM (local)	⚠️	ask_ai_spec.md	UI ready; user must opt-in to ~1.5–4.6 GB model download
16	Ask AI — Discoverability	✅	home_redesign_spec.md	Nav link with sparkle, hero launchpad with `?q=` deep-link, inline CTAs on About/Resume
17	Reading time in content schema	⚠️	—	`remark-reading-time` plugin exists; add `readingTime` to Zod schema in `content.config.ts` and `Masthead.astro`
18	Background gradient performance	✅	home_redesign_spec.md	Replaced ~24 absolute blurred divs with single `.site-mesh` radial gradient at body level
19	Webmentions	📋	—	Env schema ready (`WEBMENTION_API_KEY`, `WEBMENTION_URL`); integration not wired into components
20	Ask AI — Readiness & errors	✅	—	Provider readiness chip, queued questions, recovery card, local-model onboarding, 7-day auto-eviction with day-6 warning
21	Voice mode — Gemini Live	✅	—	Bidirectional realtime audio (WebSocket), native barge-in, RAG via `searchSiteContent` tool calls, streaming captions, sessionResumption, sliding-window context compression, `goAway` auto-reconnect
22	Voice mode — Local engine	✅	—	Fully on-device: Whisper-base.en STT + Silero VAD + Kokoro-82M TTS + WebLLM, sentence-streamed playback, RMS-based barge-in, voice-pack onboarding with 7-day eviction shared with chat path
23	Projects module — v1	✅	projects/spec-driven-sdlc.md	New `project` collection + `/projects/` index + detail layout. Inaugural piece: Spec-Driven SDLC, with two embedded live example slices (Swiggy ETA + YouTube Comment Mod) and a Spec Generator. PostHog telemetry + featured strip on home + nav link. Pagefind-indexed, RAG-indexed (11 chunks).
24	Spec Generator	✅	components/spec_generator_spec.md	Visitors describe an idea, choose a spec type, and stream a full draft spec in this repo's house style via the active LLM (`localStorage("ai_llm_provider")`). Copy / Download .md / Stop / Regenerate toolbar. NoProviderError recovery card.
25	Spec-driven examples (v2)	📋	—	Add Swiggy Search + YouTube Recs to `examples/` as the methodology library grows
26	LLM streaming for Gemini & Anthropic	✅	—	src/lib/llm-complete.ts wraps `streamGenerateContent` (Gemini SSE) + `messages?stream` (Anthropic SSE) + WebLLM passthrough. Typed `NoProviderError`. Used by SpecGenerator + ModWidget.
27	Cookie Consent & GDPR Analytics	✅	cookie-consent-spec.md	First-visit consent banner; PostHog full vs. privacy mode based on `localStorage`; `/privacy` page
28	Academy — Learning Hub + Enrolment	💡	academy-spec.md	Rebrand Learn → Academy; track grid; email enrolment (Resend); localStorage progress tracking; 7 open questions pending
29	Voice UX Polish — copy & icon	✅	voice-ux-polish-spec.md	"Chat with my work" → "Talk with my work"; MicIcon → AudioWaveIcon (waveform bars) across HeroAskAI, AIAssistant, VoiceMode
30	GitHub profile — README & fields	📋	github-profile-spec.md	Create hoshank/hoshank README, fill bio/location/company/website, repin original repos
31	Product Teardowns	📋	teardowns-spec.md	New `teardown` + `product` collections; `/teardowns/[product]/[slug]/`; deep-first; home strip Day 1; dedicated RSS; YouTube, Claude, Cursor, Instagram at launch

Backlog

Add new ideas here before writing a spec. Move to the tracker above once a spec is created and status is confirmed.

Idea	Notes
Dark / light mode per-page persistence	Currently uses system preference
Newsletter / email capture	Possible integration with ConvertKit or Buttondown
Guestbook / comments	Static comments via Utterances or Giscus

Spec Index

All specs live in specs/. Use this index to find or create a spec for any feature.

Spec file	Type	Status	Feature
examples/ask_ai_spec.md	`ai-feature`	implemented	Ask AI — RAG chat assistant
home_redesign_spec.md	`feature`	implemented	Home & site revamp + Ask AI discoverability
projects/spec-driven-sdlc.md	`project`	implemented	Spec-Driven SDLC project page (#23)
components/spec_generator_spec.md	`component`	implemented	Spec Generator React island (#24)
cookie-consent-spec.md	`feature`	implemented	Cookie Consent & GDPR Analytics (#27)
academy-spec.md	`feature`	draft	Academy — Learning Hub + Enrolment (#28)
teardowns-spec.md	`feature`	approved	Product Teardowns (#31)
voice-ux-polish-spec.md	`feature`	implemented	Voice UX Polish — copy & icon (#29)
github-profile-spec.md	`feature`	planned	GitHub Profile — README & fields (#30)

Templates (copy to create a new spec)

Template	Use for
templates/content_schema_spec.md	New Astro content collections
templates/ai_feature_spec.md	AI / RAG / LLM features
templates/feature_spec.md	Pages, routes, site sections
templates/component_spec.md	Astro or React components
templates/project_spec.md	Project / portfolio entries

How to Update This File

New feature planned — add a row to the Backlog table.
Spec created — move from Backlog to Feature Tracker, add spec link, set status 📋 planned.
Work starts — change status to 🔄 in-progress.
Feature ships — change status to ✅ implemented (or ⚠️ partial if incomplete). Update spec frontmatter status: to match.
Spec superseded — mark old spec deprecated in its frontmatter; add a note in the Spec Index row.

Outcomes

Three pillars

Specs are the contract

Evals are the acceptance

The roadmap is the backlog

The thesis

What lives behind the pillars

Demo 1, Swiggy-style ETA estimator

Live: Swiggy ETA

Demo 2, YouTube-style comment moderation

Live: YouTube Comment Moderation

Try it yourself

Spec Generator

Anti-patterns to watch for

What’s next

Feature Eta Spec

computeEta — Order ETA Estimator

Purpose

Inputs

Outputs

Formula

State Model

Acceptance Criteria

Out of Scope

Open Questions

Feature Moderate Spec

moderate() — LLM-driven Comment Classifier

Purpose

Inputs

Outputs

Prompt Contract

Acceptance Criteria

Out of Scope

Open Questions

Feature Spec

<FILL: Feature Name>

Purpose

Pages / Routes

User Interactions

State Model

Layout & Components

Acceptance Criteria

Out of Scope

Open Questions

ROADMAP

Development Roadmap — astro-citrus

Status Key

Feature Tracker

Backlog

Spec Index

Templates (copy to create a new spec)

How to Update This File

Related reading

AI Strategy: From Feature to Platform

Building AI Features: What PMs Need to Know

Data as a Product Requirement

Designing AI User Experiences