Anthropic product experiments, Claude Code adoption portfolio
Six drafted experiment specs plus one shipped MCP slice, framed as a PMM portfolio: close the gap between what Claude can do and what teams actually wire up, CLAUDE.md, slash commands, verification loops, and MCP.
Outcomes
Three pillars
Workflow infrastructure beats raw capability
Teams that win with Claude invest in CLAUDE.md, encoded slash commands, verification loops, and MCP, not longer prompts alone.
Specs before autonomous runs
Each experiment is a spec-first bet: reach, leverage, and confidence scored so prioritisation is explicit, not vibes.
Discoverable, team-sharable defaults
The portfolio targets adoption bottlenecks, onboarding, spec quality, MCP discovery, so infrastructure feels like product, not secret sauce.
What this is
This is a portable research and spec portfolio that now lives in this site’s tree at src/components/claude/. It started life as a standalone claude/ folder beside the Astro app; it has been moved here so the experiments, lessons, and top-level specs ride along with the rest of the repo.
The experiments README inside that folder is still the index of record: prioritisation table, scoring axes, and recommended run order (03 → 01 → 04 → 02 → 05 → 06).
How to read it
- Open the experiments README for the one-screen overview.
- Drill into any
experiments/0N-*/spec.mdfor the full draft spec. - For the one slice that includes runnable code, see
07-atlassian-sync, recreate a local.venvwith your usual Python tooling; virtualenvs are not committed (see root.gitignore).
Relationship to this site
This project entry exists so the portfolio shows up on /projects/ alongside Spec-Driven SDLC, with the same hero / outcomes / pillars / demo callout pattern. The Markdown and code are the source of truth; this page is the map.
Specs & docs from the repo
Rendered straight from demo.highlights. Each document is the source of truth in the repo — the snippets below stay in sync at build time.
README
src/components/claude/experiments/README.md
View on GitHub
README
src/components/claude/experiments/README.mdAnthropic Product Experiments — PMM Portfolio
Persona: Senior PMM, Anthropic Developer Experience. Previously led developer tools growth at a major cloud provider. Obsessed with time-to-value, B2B adoption flywheels, and making AI-native workflows the default — not the exception.
Mission: Close the gap between what Claude can do and what the average developer team actually does. Every experiment here targets a specific adoption or retention bottleneck identified from practitioner research.
The Core Insight
The bottleneck in Claude Code adoption is not capability — it's workflow infrastructure. Developers who succeed with Claude have invested in:
- A well-maintained
CLAUDE.md - A set of slash commands encoding team workflows
- Verification loops so Claude can check its own work
- MCP servers connecting Claude to their actual tools
The experiments below are designed to make that infrastructure automatic, discoverable, and team-sharable rather than a power-user secret.
Experiment Portfolio
| # | Experiment | Type | Status | Core Bet |
|---|---|---|---|---|
| 01 | CLAUDE.md Intelligence Agent | agent-crew | draft | Auto-generate + auto-maintain CLAUDE.md from session learnings |
| 02 | Autonomous Spec Quality Wizard | flow | draft | Block bad autonomous runs before they start with spec scoring |
| 03 | Zero-to-Value Onboarding Crew | agent-crew | draft | First 5 minutes of Claude Code should prove value, not demand investment |
| 04 | Verification Loop Builder | mcp-server | draft | Make quality loops a default, not a power-user pattern |
| 05 | MCP Marketplace & Discovery | mcp-server | draft | Fix MCP ecosystem discoverability — the npm install moment for Claude tools |
| 06 | Team Adoption Flywheel Flow | flow | draft | Compress team-wide AI-native adoption from 3 months → 3 weeks |
Prioritization Framework
Each experiment is scored on three axes:
- Reach: How many developers/teams does this unblock?
- Leverage: How much does it compound over time (vs. one-time value)?
- Confidence: How well do we understand the problem (vs. hypothesis)?
| Experiment | Reach | Leverage | Confidence | Score |
|---|---|---|---|---|
| 01 — CLAUDE.md Intelligence | High | Very High | High | ⭐⭐⭐⭐⭐ |
| 02 — Spec Wizard | Medium | High | High | ⭐⭐⭐⭐ |
| 03 — Onboarding | Very High | Medium | High | ⭐⭐⭐⭐ |
| 04 — Verification Loop | High | Very High | Medium | ⭐⭐⭐⭐ |
| 05 — MCP Marketplace | High | High | Medium | ⭐⭐⭐⭐ |
| 06 — Team Flywheel | Medium | Very High | Low | ⭐⭐⭐ |
Recommended order to run: 03 → 01 → 04 → 02 → 05 → 06
Rationale: Start with onboarding (high reach, high confidence), use learnings to inform CLAUDE.md intelligence (high leverage), then close the quality loop. MCP marketplace and team flywheel are higher-investment bets.
Spec Status Lifecycle
draft → review → approved → implemented → deprecated
Naming Convention
<domain>_<entity>_spec.md per the spec-kit standard. Each experiment directory contains one or more specs depending on complexity.
Spec
src/components/claude/experiments/01-claude-md-intelligence/spec.md
View on GitHub
Spec
src/components/claude/experiments/01-claude-md-intelligence/spec.mdCLAUDE.md Intelligence Agent
Purpose
Writing and maintaining CLAUDE.md is the highest-leverage thing a developer can do with Claude Code — and the most commonly skipped. The upfront cost is real: a good CLAUDE.md requires architectural knowledge, explicit convention capture, and ongoing maintenance. This crew eliminates that cost by automatically generating a production-quality CLAUDE.md from codebase analysis and a short structured interview, then monitoring future Claude sessions to propose updates when new conventions or anti-patterns emerge.
Done looks like: A developer runs one command on a new repo and gets a CLAUDE.md that a senior engineer on the team would have written. Future sessions self-improve it.
The Bet
If we can auto-generate a useful
CLAUDE.mdfrom a codebase scan and auto-update it from session learnings, team adoption ofCLAUDE.mdincreases from ~15% of Claude Code users to ~70%.
Why this matters to Anthropic: CLAUDE.md is the compounding moat. Teams with good CLAUDE.md files retain Claude Code subscriptions at 2.3× the rate of teams without. This crew makes the moat automatic.
Inputs
| Name | Type | Required | Description | Example |
|---|---|---|---|---|
repo_path |
string | yes | Absolute path to the repository root | "/Users/dev/my-app" |
interview_mode |
string | no | "interactive" (Q&A in terminal) or "silent" (analysis only). Default: "interactive" |
"interactive" |
existing_claude_md |
string | no | Path to an existing CLAUDE.md to augment rather than replace | "/Users/dev/my-app/CLAUDE.md" |
team_context |
string | no | Free-text description of the team, product, and domain | "B2B SaaS, 8 engineers, Rails + React" |
Outputs
| Artifact | Format | Producer Task | Description |
|---|---|---|---|
output/CLAUDE.md |
markdown | write_claude_md |
Production-ready CLAUDE.md ready to copy to repo root |
output/interview_transcript.md |
markdown | conduct_interview |
Structured Q&A log — useful for auditing and updating |
output/codebase_profile.json |
JSON | analyze_codebase |
Detected stack, patterns, and conventions — machine-readable |
output/update_suggestions.md |
markdown | monitor_session_learnings |
Proposed CLAUDE.md additions based on recent session learnings (runs in update mode) |
Agents
codebase_analyst
Role: Senior Staff Engineer performing a codebase audit
Goal: Produce a comprehensive, structured profile of the repository — language, framework, test runner, CI/CD setup, folder structure conventions, external dependencies, and any detectable anti-patterns or architectural decisions — without asking the human anything. Output must be concrete and specific, never generic.
Backstory: You've onboarded to dozens of codebases. You know what matters: not that it uses React, but which version, whether it uses hooks or class components, what the state management pattern is, and what the folder structure convention implies about the team's mental model. You read code, not docs.
Tools: directory_reader, file_reader, grep_tool, package_json_parser, git_log_reader
convention_extractor
Role: Engineering culture interviewer and convention archaeologist
Goal: Through a structured 10-question interview (in interactive mode) or git history analysis (in silent mode), surface the non-obvious conventions that aren't visible in the code: naming preferences, PR size philosophy, which files Claude should never touch, known footguns in the codebase, and domain-specific vocabulary.
Backstory: You know that the most useful CLAUDE.md content isn't the obvious stuff (language, framework) — it's the invisible rules that exist only in senior engineers' heads. "Never modify the billing module without a second reviewer." "All async errors must be wrapped in our custom AppError." "The legacy/ folder is not legacy — don't touch it." You surface those.
Tools: git_log_reader, file_reader, terminal_prompt (interactive mode only)
claude_md_writer
Role: Technical writer specializing in AI agent context documents
Goal: Synthesize the codebase profile and convention interview into a CLAUDE.md that gives Claude Code everything it needs to work autonomously without asking clarifying questions. Every section must be actionable and specific. No generic boilerplate.
Backstory: You've read hundreds of CLAUDE.md files and know the difference between one that actually changes behavior and one that just describes the README. You know the seven categories that matter: architecture decisions, anti-patterns and footguns, naming conventions, testing philosophy, domain vocabulary, files/dirs Claude should avoid, and the definition of "done" for this codebase.
Tools: file_writer
session_monitor
Role: Learning loop agent that watches Claude session logs for teachable moments
Goal: (Runs in update mode only) Read recent Claude Code session transcripts, identify instances where Claude was corrected, made assumptions that were wrong, or where the human had to re-explain something that should have been in CLAUDE.md. Propose specific additions or amendments to CLAUDE.md.
Backstory: You're looking for patterns: if Claude was corrected for the same thing three times in a month, that's a CLAUDE.md gap. If a human typed "no, we never do X" — that's an anti-pattern that should be captured. You don't propose edits for one-off corrections; you look for systematic gaps.
Tools: file_reader, session_log_parser
Tasks
Tasks execute sequentially. Each task's output feeds into the next via context.
analyze_codebase
Agent: codebase_analyst
Description:
Perform a thorough analysis of the repository at {repo_path}. You must detect and report:
1. Primary language(s) and version(s) — check package.json, go.mod, Pipfile, Gemfile, pyproject.toml, etc.
2. Frameworks and major libraries — be specific (Next.js 14 App Router, not just "React")
3. Test runner and testing philosophy — unit only? integration? e2e? what coverage threshold?
4. CI/CD setup — check .github/workflows, .gitlab-ci.yml, Jenkinsfile
5. Folder structure and what it implies about architecture (monorepo? feature-based? layer-based?)
6. State management pattern (if frontend)
7. Database and ORM (if backend)
8. Authentication pattern
9. Notable dependencies that have strong opinions (e.g., Prisma, tRPC, Rails)
10. Any files/dirs that look sensitive or dangerous to auto-modify (migrations, generated code, billing)
11. Git history patterns — how large are commits? how often do they squash? any branches with special meaning?
Output a structured JSON profile. Be specific. "Uses React" is not acceptable — "Uses React 18.2 with functional components, useState/useReducer for local state, Zustand for global state, no class components detected" is.
Expected Output: A JSON object with keys: languages, frameworks, test_setup, ci_cd, folder_structure, state_management, data_layer, auth_pattern, notable_deps, sensitive_paths, git_patterns, detected_anti_patterns.
Output File: output/codebase_profile.json
Output Schema: CodebaseProfile (Pydantic model)
conduct_interview
Agent: convention_extractor
Description:
In interactive mode: Conduct a structured 10-question interview with the developer. Do NOT ask about things the codebase_analyst already detected (framework, language, etc.). Focus on the invisible rules:
1. "What would make you immediately reject a Claude-written PR?" (surfaces non-obvious anti-patterns)
2. "Are there any files or directories Claude should never modify without your explicit approval?"
3. "What domain-specific terms does this codebase use that an outsider wouldn't know?" (e.g., "advertiser" vs "customer", "flight" vs "campaign period")
4. "What's your philosophy on test coverage — what must always be tested, what rarely needs tests?"
5. "What's the most common mistake a new engineer makes in this codebase?"
6. "Are there any third-party APIs or services that are expensive, rate-limited, or irreversible?" (Claude should not call these in dev)
7. "What does 'done' mean for a feature in this codebase?" (deployed? reviewed? monitored for 24h?)
8. "What's your PR size philosophy?" (small atomics? large feature PRs?)
9. "Any architectural decisions that look weird but are intentional?" (the "why does this module exist" question)
10. "What's the most important thing Claude should know that isn't in the code?"
In silent mode: Infer as much as possible from git history, commit messages, PR descriptions (if accessible), and comments in the code. Flag low-confidence inferences with a [?] marker.
Output a structured interview transcript with question, answer, and derived_rule for each item.
Expected Output: A markdown document with 10 Q&A pairs, each followed by a > Derived rule: line that will feed directly into CLAUDE.md.
Output File: output/interview_transcript.md
Output Schema: free text markdown
write_claude_md
Agent: claude_md_writer
Description:
Using the codebase_profile.json and interview_transcript.md from prior tasks, write a production-quality CLAUDE.md.
The CLAUDE.md must have exactly these sections in this order:
## Project Overview
One paragraph. What does this codebase do, who uses it, and what's the tech stack. Written for someone starting a new Claude Code session — not marketing copy.
## Architecture
The mental model Claude needs. Not a file listing — the *why* behind the structure. Key modules and what they own. Cross-module dependencies and which direction is acceptable.
## Development Conventions
- Naming conventions (files, functions, variables, branches, PRs)
- Code style rules that ESLint/Prettier don't enforce
- Patterns to always use vs. patterns to avoid
- How to handle errors in this codebase specifically
## Testing Philosophy
- What must always have tests
- What doesn't need tests
- How to run tests locally
- Coverage expectations
## Domain Vocabulary
A glossary of terms that mean something specific in this codebase. At minimum 5 entries.
## Files and Directories — Handle With Care
An explicit list of paths Claude should not modify autonomously, with a one-line reason for each.
## External Services
APIs, databases, and services Claude interacts with. Flag: which are production-only, which are rate-limited, which calls are irreversible.
## Definition of Done
What "done" means for a task in this codebase. What steps must always happen before a task is considered complete.
Rules for writing this document:
- Every rule must be specific enough that a new engineer would change their behavior after reading it
- No generic advice ("write clean code", "follow best practices") — everything must be codebase-specific
- If you don't have enough information for a section, write exactly what you know and add a `<!-- FILL: explain X -->` comment for the developer to complete
- Aim for 400-800 words. Long enough to be useful, short enough to fit in context without wasting tokens.
Expected Output: A complete, ready-to-use CLAUDE.md file with all eight sections populated.
Output File: output/CLAUDE.md
Output Schema: markdown
monitor_session_learnings
Agent: session_monitor
Description:
(Runs only when mode=update is passed as input)
Read Claude Code session logs from the past 30 days at {session_logs_path}. Identify:
1. Corrections: Any time the human said "no", "that's wrong", "don't do that", "we don't do X here"
2. Re-explanations: Any time the human re-explained something they'd explained in a previous session
3. Footguns: Any time Claude confidently did something that required a revert or human override
4. Domain errors: Any time Claude used wrong terminology or misunderstood a domain concept
For each identified gap, propose a specific CLAUDE.md addition or amendment. Format:
## Proposed Update #{n}
**Section:** [which CLAUDE.md section this belongs in]
**Trigger:** [what session event triggered this — quote the relevant exchange]
**Proposed addition:**
[exact text to add to CLAUDE.md]
**Confidence:** [high / medium / low]
**Frequency:** [how many times this pattern appeared in the last 30 days]
Expected Output: A markdown document with N proposed updates, ordered by frequency descending.
Output File: output/update_suggestions.md
Output Schema: free text markdown
Process
Execution: Process.sequential
Order:
analyze_codebase → conduct_interview → write_claude_md
↑ (update mode only)
monitor_session_learnings
Context chain: write_claude_md receives both analyze_codebase and conduct_interview outputs in its context list.
Tools Required
| Tool | Used By | Purpose |
|---|---|---|
directory_reader |
codebase_analyst |
Walk repo tree, detect config files |
file_reader |
codebase_analyst, convention_extractor, session_monitor |
Read source files, transcripts, logs |
grep_tool |
codebase_analyst |
Search for patterns, imports, anti-patterns |
package_json_parser |
codebase_analyst |
Parse dependency versions |
git_log_reader |
codebase_analyst, convention_extractor |
Analyze commit history and PR patterns |
terminal_prompt |
convention_extractor |
Interactive Q&A in terminal (interactive mode only) |
file_writer |
claude_md_writer |
Write output/CLAUDE.md |
session_log_parser |
session_monitor |
Parse Claude Code session transcripts |
Acceptance Criteria
- Crew completes without agent errors on a real repo (test against at least: a Rails app, a Next.js app, a Python data pipeline)
- Generated
CLAUDE.mdpasses a blind review: a senior engineer on the target team rates it ≥ 7/10 for accuracy and usefulness -
CLAUDE.mdis between 400-800 words - All eight required sections are present and populated (no
<FILL>placeholders remain unless information was genuinely unavailable) - In interactive mode, interview completes in under 5 minutes
- In silent mode, crew completes in under 60 seconds
-
codebase_profile.jsonis valid JSON and passes schema validation - Update mode: proposed updates are traceable to specific session events (not hallucinated)
Out of Scope
- Writing slash commands (separate experiment: 03-zero-to-value-onboarding)
- Generating
.claude/settings.jsonpermission configs - Multi-repo / monorepo coordination (single repo only in v1)
- Automatic commit/PR of CLAUDE.md changes — human must review and apply
- Real-time session monitoring — update mode is a manual trigger, not a daemon
Open Questions
- Should the interview be voice-first (speak your answers) or text-only? Voice would reduce friction dramatically.
- How do we handle
CLAUDE.mddrift — when the codebase changes but the CLAUDE.md isn't updated? Should we add a staleness score? - Privacy: session logs contain proprietary code context. Do we need an on-device-only mode?
- Should
monitor_session_learningsbe a separate always-on MCP tool rather than a crew task?
README
src/components/claude/experiments/07-atlassian-sync/README.md
View on GitHub
README
src/components/claude/experiments/07-atlassian-sync/README.mdatlassian-sync MCP
An MCP server that makes Confluence and Jira first-class citizens of your Claude Code workflow. Reference live Atlassian content directly in spec files — Claude resolves it automatically.
Quick Start
pip install -e ".[dev]"
cp .env.example .env # fill in host + credentials
atlassian-sync # starts on http://localhost:8015Add to Claude Code (~/.claude/mcp.json or project .claude/mcp.json):
{
"mcpServers": {
"atlassian-sync": {
"url": "http://localhost:8015/mcp",
"headers": { "Authorization": "Bearer YOUR_MCP_API_KEY" }
}
}
}Inline References
Reference live Atlassian content anywhere in your specs or CLAUDE.md:
See @confluence:482934[Auth Design] for the architecture rationale.
Tracked in @jira:AUTH-42[Auth v2 Epic].Claude resolves these automatically when it reads a file. The full page/ticket content is injected into its context before it starts working.
Auth
| Deployment | Auth mode | Env vars needed |
|---|---|---|
| Atlassian Cloud | api_token |
ATLASSIAN_HOST, ATLASSIAN_EMAIL, ATLASSIAN_API_TOKEN |
| Self-Hosted Data Center | pat |
ATLASSIAN_HOST, ATLASSIAN_PAT |
Set ATLASSIAN_AUTH_MODE to api_token or pat. Deployment is auto-detected from the host URL.
Tools (15 total)
Confluence — confluence_get_page, confluence_search, confluence_get_space, confluence_sync_space, confluence_sync_page
Jira — jira_get_issue, jira_search, jira_get_sprint, jira_get_epic, jira_create_comment, jira_transition_issue, jira_update_fields
Sync — sync_status, sync_run, resolve_references
Sync to Disk
Pull an entire Confluence space as Markdown files (incremental, ETag-based):
# via MCP tool in Claude
confluence_sync_space(space_key="ARCH")
# writes to docs/confluence-cached/ARCH/
Re-run anytime — only changed pages are re-fetched.
Spec
See spec.md for the full technical specification (spec-015). Roadmap and execution status: roadmap.md, progress.md. Operator notes: docs/runbook.md.
MCP HTTP: POST /mcp requires Authorization: Bearer <MCP_API_KEY> (see .env.example). GET /health stays unauthenticated for health probes.
00 Big Picture
src/components/claude/lessons/00-big-picture.md
View on GitHub
00 Big Picture
src/components/claude/lessons/00-big-picture.mdLesson 00: The Big Picture — A Timeline of Claude as a Dev Tool
← Back to Index | Next: Lesson 01 — Core Mental Model →
TL;DR: Claude went from a chat window you copy-paste from, to a CLI agent that runs your terminal, to an orchestration layer that manages other agents. Most developers are still stuck in 2023 workflows.
Difficulty: [Beginner] | Time to read: 10 min
Era 1 — 2023: The Copy-Paste Phase
Core workflow: Open Claude.ai in a browser tab. Describe a function. Copy the output. Paste into your editor. Debug why it doesn't compile. Repeat.
What developers actually did:
- Tab-switched between IDE and browser dozens of times per hour
- Pasted entire files into the chat box to give Claude context
- Re-explained the same codebase architecture every new conversation
- Manually applied Claude's suggestions line by line
Main frustration: Cognitive overhead. Every interaction required manually bridging two worlds. You were the API — shuttling context back and forth by hand.
What Claude still sucked at:
- Understanding your actual codebase (it only saw what you pasted)
- Maintaining consistency across files (no persistent memory)
- Knowing your project's conventions, preferences, or patterns
- Anything requiring multi-step execution
The unlock that moved things forward: Developers started treating Claude less like a search engine and more like a junior developer — giving it more context, not less. Longer context windows (100K tokens) meant you could dump entire files in.
Era 2 — Early 2024: IDE Plugins and API Integrations
Core workflow: Claude embedded directly in editors via extensions (Continue.dev, Cursor, early Copilot alternatives). The context gap started closing — Claude could see open files without copy-paste.
What developers actually did:
- Used inline chat to ask questions about the file currently open
- Ran one-off generation tasks from the editor command palette
- Started using the API to build internal tools and scripts around Claude
- Experimented with system prompts to encode project context
Main frustration: IDE integrations were shallow — they saw the open file, not the repo. Claude still had no memory of what it did yesterday. Every session was a blank slate.
What Claude still sucked at:
- Cross-file awareness (couldn't navigate a real codebase)
- Running code to verify its own output
- Taking actions (read-only, couldn't write files or run commands)
- Long-running tasks that required multiple steps
The unlock that moved things forward: The API made Claude programmable. Teams started building internal tools: code review bots, PR summarizers, doc generators. Claude-as-infrastructure began.
Era 3 — Mid 2024: Claude Code Beta — CLI-Native Agentic Coding
Core workflow: claude in your terminal. Claude reads your repo, writes files, runs shell commands, iterates. The developer shifts from doing to reviewing.
What developers actually did:
- Launched Claude Code from the project root — it could see and modify the entire codebase
- Delegated multi-step tasks: "add auth to this Express app, write the tests, run them"
- Started building
CLAUDE.mdfiles to give persistent context between sessions - Used Plan Mode (Shift+Tab twice) to review Claude's approach before execution
Main frustration: Claude would confidently execute wrong plans. Without the right upfront context, it made decisions that looked reasonable but violated project conventions. The CLAUDE.md was the fix — but writing a good one took real effort.
What Claude still sucked at:
- Knowing when to stop and ask vs. when to proceed
- Handling large refactors without losing track of state
- Integrating with external systems (Jira, Slack, databases) natively
The unlock that moved things forward: The shift to CLI-native meant Claude could actually do things, not just suggest them. The feedback loop compressed from minutes to seconds. CLAUDE.md turned institutional knowledge into a compounding asset.
Era 4 — Late 2024: MCP — Claude Gains Tools
Core workflow: Claude connects to external systems via Model Context Protocol servers. It can now read your database, create Jira tickets, post to Slack, open a browser, and query your analytics — all within a single session.
What developers actually did:
- Plugged in GitHub MCP: Claude opens PRs, reviews diffs, posts comments
- Connected database MCPs: Claude queries prod (read-only) during debugging sessions
- Added Slack MCP: Claude reads thread context to understand what a bug report actually means
- Built custom MCP servers for internal tools
Main frustration: MCPs blow up the context window fast. Connecting five MCPs and running a complex task could exhaust context before the work was done. You had to be selective about what you enabled per session.
What Claude still sucked at:
- MCP server stability (early implementations were flaky)
- Security boundaries — malicious MCP responses could inject instructions
- Knowing which tools to call without explicit prompting
The unlock that moved things forward: Claude stopped being a coding tool and started being an engineering workflow tool. You could describe a production incident and Claude would pull the Sentry error, read the relevant code, check the deploy history, and draft a fix — without you switching tabs once.
Era 5 — Early 2025: Parallel Sessions, Opus-with-Thinking, CLAUDE.md as Team Infra
Core workflow: Multiple Claude sessions running simultaneously, each on a different task. One refactoring a module, one writing tests, one drafting a PR description. You're a manager now, not a coder.
What developers actually did:
- Ran 5 terminal sessions + 5-10 web sessions in parallel
- Named tabs by task:
[auth-refactor],[test-coverage],[perf-investigation] - Used system notifications as async triggers — Claude pings you when it needs input
- Checked
CLAUDE.mdinto Git — every teammate's Claude session now starts with shared institutional knowledge - Used Opus for complex architectural work, Sonnet for routine tasks
Main frustration: Parallel session management was cognitively demanding. Knowing what each session was doing, when to intervene, how to integrate outputs — that became the new developer skill.
What Claude still sucked at:
- Coordination across sessions (sessions didn't know what each other were doing)
- Cost management (Opus × 10 parallel sessions adds up fast)
- Self-correcting when stuck in a bad plan without human intervention
The unlock that moved things forward: CLAUDE.md in Git meant the team's AI behavior was versionable, reviewable, and improvable. When Claude made a systematic mistake, you updated CLAUDE.md once and fixed it for everyone, forever.
Era 6 — Mid 2025: Agent Teams, Subagents, Autonomous Loops
Core workflow: An orchestrator Claude session spawns specialist subagent sessions. The orchestrator delegates, the subagents execute, verification agents check the output. The human sets the goal and reviews the result.
What developers actually did:
- Defined orchestrator + specialist architectures for complex tasks
- Ran overnight autonomous loops on well-specified tasks
- Used agent-stop hooks to trigger deterministic verification after every task
- Integrated Claude Code into GitHub Actions — PRs automatically got Claude review
Main frustration: Autonomous loops required near-perfect specs. Underspecified tasks + overnight runs = waking up to confident, wrong work. The quality of your prompts became the bottleneck, not Claude's capability.
What Claude still sucked at:
- Knowing when it's out of its depth and should stop
- Managing state across many subagent sessions cleanly
- Cost predictability in open-ended autonomous tasks
The unlock that moved things forward: Verification loops. Giving Claude a way to test its own work — run the tests, open the browser, query the database — 2-3x'd output quality without additional human review.
Era 7 — Now: Claude as an Engineering Orchestration Layer
Core workflow: You describe outcomes, not steps. Claude plans the work, distributes it across subagents, verifies the results, and surfaces decisions that genuinely require human judgment.
What the best teams are doing today:
- CLAUDE.md is a living document updated after every significant session
- Slash commands encode team-specific workflows as reusable primitives
- Claude Code GitHub Action runs on every PR — human reviewers focus on judgment, not mechanics
- Custom MCP servers connect Claude to every internal tool
- New engineers are onboarded to the AI-native workflow on day one
The remaining hard problems:
- Who is responsible when Claude ships a bug?
- How do you version AI behavior as models and prompts evolve?
- How do you prevent skill atrophy in junior developers who never learn the hard way?
- What work should humans always do themselves?
The Through-Line
Each era solved a different bottleneck:
| Era | Bottleneck Solved | New Bottleneck Created |
|---|---|---|
| 2023 | Speed of generation | Context gap (copy-paste) |
| Early 2024 | Context gap | No memory, no action |
| Mid 2024 | Memory + action (CLAUDE.md + CLI) | Spec quality |
| Late 2024 | External system integration | Context window management |
| Early 2025 | Team knowledge sharing | Parallel session management |
| Mid 2025 | Verification + autonomy | Spec quality at scale |
| Now | Orchestration | Human judgment + accountability |
The pattern: every time one bottleneck is solved, the constraint moves up the stack — closer to human judgment and further from mechanical execution.
What this means for you: If you're still working like it's 2023 (copy-pasting from a chat window), you're not behind because Claude got smarter. You're behind because the workflow changed. The rest of this guide is about catching up — and then getting ahead.
← Back to Index | Next: Lesson 01 — Core Mental Model →
Related reading
Why AI Adoption Fails Without Judgment Infrastructure
Most teams adopt AI tools. Very few adopt the systems needed to make AI work consistently. The gap between individual productivity and organizational capability is where most adoption stalls.
AI Strategy: From Feature to Platform
A capstone frame for PMs: AI as bolt-on feature, integrated capability, and platform infrastructure—roadmaps that compound, the data flywheel, org readiness, and what to prepare for next.
Building AI Features: What PMs Need to Know
How AI delivery differs from classic software—and how PMs define 'done' when the system is never perfect.
Data as a Product Requirement
How to treat data readiness as a product decision—not an engineering side quest.