skip to content

Search

PMM-style portfolio · Learning repo 2026
Exploration

Anthropic product experiments, Claude Code adoption portfolio

Six drafted experiment specs plus one shipped MCP slice, framed as a PMM portfolio: close the gap between what Claude can do and what teams actually wire up, CLAUDE.md, slash commands, verification loops, and MCP.

Markdown specs
Python
FastMCP
Claude Code
MCP

Outcomes

6Draft experiment specsagent, flow, and MCP bets
1Implemented sliceAtlassian sync MCP + Python package
3Supporting artefactslessons, specs, research notes

Three pillars

01

Workflow infrastructure beats raw capability

Teams that win with Claude invest in CLAUDE.md, encoded slash commands, verification loops, and MCP, not longer prompts alone.

02

Specs before autonomous runs

Each experiment is a spec-first bet: reach, leverage, and confidence scored so prioritisation is explicit, not vibes.

03

Discoverable, team-sharable defaults

The portfolio targets adoption bottlenecks, onboarding, spec quality, MCP discovery, so infrastructure feels like product, not secret sauce.

What this is

This is a portable research and spec portfolio that now lives in this site’s tree at src/components/claude/. It started life as a standalone claude/ folder beside the Astro app; it has been moved here so the experiments, lessons, and top-level specs ride along with the rest of the repo.

The experiments README inside that folder is still the index of record: prioritisation table, scoring axes, and recommended run order (03 → 01 → 04 → 02 → 05 → 06).

How to read it

  1. Open the experiments README for the one-screen overview.
  2. Drill into any experiments/0N-*/spec.md for the full draft spec.
  3. For the one slice that includes runnable code, see 07-atlassian-sync, recreate a local .venv with your usual Python tooling; virtualenvs are not committed (see root .gitignore).

Relationship to this site

This project entry exists so the portfolio shows up on /projects/ alongside Spec-Driven SDLC, with the same hero / outcomes / pillars / demo callout pattern. The Markdown and code are the source of truth; this page is the map.

Specs & docs from the repo

Rendered straight from demo.highlights. Each document is the source of truth in the repo — the snippets below stay in sync at build time.

README

src/components/claude/experiments/README.md
View on GitHub

Anthropic Product Experiments — PMM Portfolio

Persona: Senior PMM, Anthropic Developer Experience. Previously led developer tools growth at a major cloud provider. Obsessed with time-to-value, B2B adoption flywheels, and making AI-native workflows the default — not the exception.

Mission: Close the gap between what Claude can do and what the average developer team actually does. Every experiment here targets a specific adoption or retention bottleneck identified from practitioner research.


The Core Insight

The bottleneck in Claude Code adoption is not capability — it's workflow infrastructure. Developers who succeed with Claude have invested in:

  1. A well-maintained CLAUDE.md
  2. A set of slash commands encoding team workflows
  3. Verification loops so Claude can check its own work
  4. MCP servers connecting Claude to their actual tools

The experiments below are designed to make that infrastructure automatic, discoverable, and team-sharable rather than a power-user secret.


Experiment Portfolio

# Experiment Type Status Core Bet
01 CLAUDE.md Intelligence Agent agent-crew draft Auto-generate + auto-maintain CLAUDE.md from session learnings
02 Autonomous Spec Quality Wizard flow draft Block bad autonomous runs before they start with spec scoring
03 Zero-to-Value Onboarding Crew agent-crew draft First 5 minutes of Claude Code should prove value, not demand investment
04 Verification Loop Builder mcp-server draft Make quality loops a default, not a power-user pattern
05 MCP Marketplace & Discovery mcp-server draft Fix MCP ecosystem discoverability — the npm install moment for Claude tools
06 Team Adoption Flywheel Flow flow draft Compress team-wide AI-native adoption from 3 months → 3 weeks

Prioritization Framework

Each experiment is scored on three axes:

  • Reach: How many developers/teams does this unblock?
  • Leverage: How much does it compound over time (vs. one-time value)?
  • Confidence: How well do we understand the problem (vs. hypothesis)?
Experiment Reach Leverage Confidence Score
01 — CLAUDE.md Intelligence High Very High High ⭐⭐⭐⭐⭐
02 — Spec Wizard Medium High High ⭐⭐⭐⭐
03 — Onboarding Very High Medium High ⭐⭐⭐⭐
04 — Verification Loop High Very High Medium ⭐⭐⭐⭐
05 — MCP Marketplace High High Medium ⭐⭐⭐⭐
06 — Team Flywheel Medium Very High Low ⭐⭐⭐

Recommended order to run: 03 → 01 → 04 → 02 → 05 → 06

Rationale: Start with onboarding (high reach, high confidence), use learnings to inform CLAUDE.md intelligence (high leverage), then close the quality loop. MCP marketplace and team flywheel are higher-investment bets.


Spec Status Lifecycle

draft → review → approved → implemented → deprecated

Naming Convention

<domain>_<entity>_spec.md per the spec-kit standard. Each experiment directory contains one or more specs depending on complexity.

Spec

src/components/claude/experiments/01-claude-md-intelligence/spec.md
View on GitHub

CLAUDE.md Intelligence Agent

Purpose

Writing and maintaining CLAUDE.md is the highest-leverage thing a developer can do with Claude Code — and the most commonly skipped. The upfront cost is real: a good CLAUDE.md requires architectural knowledge, explicit convention capture, and ongoing maintenance. This crew eliminates that cost by automatically generating a production-quality CLAUDE.md from codebase analysis and a short structured interview, then monitoring future Claude sessions to propose updates when new conventions or anti-patterns emerge.

Done looks like: A developer runs one command on a new repo and gets a CLAUDE.md that a senior engineer on the team would have written. Future sessions self-improve it.

The Bet

If we can auto-generate a useful CLAUDE.md from a codebase scan and auto-update it from session learnings, team adoption of CLAUDE.md increases from ~15% of Claude Code users to ~70%.

Why this matters to Anthropic: CLAUDE.md is the compounding moat. Teams with good CLAUDE.md files retain Claude Code subscriptions at 2.3× the rate of teams without. This crew makes the moat automatic.

Inputs

Name Type Required Description Example
repo_path string yes Absolute path to the repository root "/Users/dev/my-app"
interview_mode string no "interactive" (Q&A in terminal) or "silent" (analysis only). Default: "interactive" "interactive"
existing_claude_md string no Path to an existing CLAUDE.md to augment rather than replace "/Users/dev/my-app/CLAUDE.md"
team_context string no Free-text description of the team, product, and domain "B2B SaaS, 8 engineers, Rails + React"

Outputs

Artifact Format Producer Task Description
output/CLAUDE.md markdown write_claude_md Production-ready CLAUDE.md ready to copy to repo root
output/interview_transcript.md markdown conduct_interview Structured Q&A log — useful for auditing and updating
output/codebase_profile.json JSON analyze_codebase Detected stack, patterns, and conventions — machine-readable
output/update_suggestions.md markdown monitor_session_learnings Proposed CLAUDE.md additions based on recent session learnings (runs in update mode)

Agents

codebase_analyst

Role: Senior Staff Engineer performing a codebase audit

Goal: Produce a comprehensive, structured profile of the repository — language, framework, test runner, CI/CD setup, folder structure conventions, external dependencies, and any detectable anti-patterns or architectural decisions — without asking the human anything. Output must be concrete and specific, never generic.

Backstory: You've onboarded to dozens of codebases. You know what matters: not that it uses React, but which version, whether it uses hooks or class components, what the state management pattern is, and what the folder structure convention implies about the team's mental model. You read code, not docs.

Tools: directory_reader, file_reader, grep_tool, package_json_parser, git_log_reader


convention_extractor

Role: Engineering culture interviewer and convention archaeologist

Goal: Through a structured 10-question interview (in interactive mode) or git history analysis (in silent mode), surface the non-obvious conventions that aren't visible in the code: naming preferences, PR size philosophy, which files Claude should never touch, known footguns in the codebase, and domain-specific vocabulary.

Backstory: You know that the most useful CLAUDE.md content isn't the obvious stuff (language, framework) — it's the invisible rules that exist only in senior engineers' heads. "Never modify the billing module without a second reviewer." "All async errors must be wrapped in our custom AppError." "The legacy/ folder is not legacy — don't touch it." You surface those.

Tools: git_log_reader, file_reader, terminal_prompt (interactive mode only)


claude_md_writer

Role: Technical writer specializing in AI agent context documents

Goal: Synthesize the codebase profile and convention interview into a CLAUDE.md that gives Claude Code everything it needs to work autonomously without asking clarifying questions. Every section must be actionable and specific. No generic boilerplate.

Backstory: You've read hundreds of CLAUDE.md files and know the difference between one that actually changes behavior and one that just describes the README. You know the seven categories that matter: architecture decisions, anti-patterns and footguns, naming conventions, testing philosophy, domain vocabulary, files/dirs Claude should avoid, and the definition of "done" for this codebase.

Tools: file_writer


session_monitor

Role: Learning loop agent that watches Claude session logs for teachable moments

Goal: (Runs in update mode only) Read recent Claude Code session transcripts, identify instances where Claude was corrected, made assumptions that were wrong, or where the human had to re-explain something that should have been in CLAUDE.md. Propose specific additions or amendments to CLAUDE.md.

Backstory: You're looking for patterns: if Claude was corrected for the same thing three times in a month, that's a CLAUDE.md gap. If a human typed "no, we never do X" — that's an anti-pattern that should be captured. You don't propose edits for one-off corrections; you look for systematic gaps.

Tools: file_reader, session_log_parser


Tasks

Tasks execute sequentially. Each task's output feeds into the next via context.

analyze_codebase

Agent: codebase_analyst

Description:

Perform a thorough analysis of the repository at {repo_path}. You must detect and report:

1. Primary language(s) and version(s) — check package.json, go.mod, Pipfile, Gemfile, pyproject.toml, etc.
2. Frameworks and major libraries — be specific (Next.js 14 App Router, not just "React")
3. Test runner and testing philosophy — unit only? integration? e2e? what coverage threshold?
4. CI/CD setup — check .github/workflows, .gitlab-ci.yml, Jenkinsfile
5. Folder structure and what it implies about architecture (monorepo? feature-based? layer-based?)
6. State management pattern (if frontend)
7. Database and ORM (if backend)
8. Authentication pattern
9. Notable dependencies that have strong opinions (e.g., Prisma, tRPC, Rails)
10. Any files/dirs that look sensitive or dangerous to auto-modify (migrations, generated code, billing)
11. Git history patterns — how large are commits? how often do they squash? any branches with special meaning?

Output a structured JSON profile. Be specific. "Uses React" is not acceptable — "Uses React 18.2 with functional components, useState/useReducer for local state, Zustand for global state, no class components detected" is.

Expected Output: A JSON object with keys: languages, frameworks, test_setup, ci_cd, folder_structure, state_management, data_layer, auth_pattern, notable_deps, sensitive_paths, git_patterns, detected_anti_patterns.

Output File: output/codebase_profile.json

Output Schema: CodebaseProfile (Pydantic model)


conduct_interview

Agent: convention_extractor

Description:

In interactive mode: Conduct a structured 10-question interview with the developer. Do NOT ask about things the codebase_analyst already detected (framework, language, etc.). Focus on the invisible rules:

1. "What would make you immediately reject a Claude-written PR?" (surfaces non-obvious anti-patterns)
2. "Are there any files or directories Claude should never modify without your explicit approval?"
3. "What domain-specific terms does this codebase use that an outsider wouldn't know?" (e.g., "advertiser" vs "customer", "flight" vs "campaign period")
4. "What's your philosophy on test coverage — what must always be tested, what rarely needs tests?"
5. "What's the most common mistake a new engineer makes in this codebase?"
6. "Are there any third-party APIs or services that are expensive, rate-limited, or irreversible?" (Claude should not call these in dev)
7. "What does 'done' mean for a feature in this codebase?" (deployed? reviewed? monitored for 24h?)
8. "What's your PR size philosophy?" (small atomics? large feature PRs?)
9. "Any architectural decisions that look weird but are intentional?" (the "why does this module exist" question)
10. "What's the most important thing Claude should know that isn't in the code?"

In silent mode: Infer as much as possible from git history, commit messages, PR descriptions (if accessible), and comments in the code. Flag low-confidence inferences with a [?] marker.

Output a structured interview transcript with question, answer, and derived_rule for each item.

Expected Output: A markdown document with 10 Q&A pairs, each followed by a > Derived rule: line that will feed directly into CLAUDE.md.

Output File: output/interview_transcript.md

Output Schema: free text markdown


write_claude_md

Agent: claude_md_writer

Description:

Using the codebase_profile.json and interview_transcript.md from prior tasks, write a production-quality CLAUDE.md.

The CLAUDE.md must have exactly these sections in this order:

## Project Overview
One paragraph. What does this codebase do, who uses it, and what's the tech stack. Written for someone starting a new Claude Code session — not marketing copy.

## Architecture
The mental model Claude needs. Not a file listing — the *why* behind the structure. Key modules and what they own. Cross-module dependencies and which direction is acceptable.

## Development Conventions
- Naming conventions (files, functions, variables, branches, PRs)
- Code style rules that ESLint/Prettier don't enforce
- Patterns to always use vs. patterns to avoid
- How to handle errors in this codebase specifically

## Testing Philosophy
- What must always have tests
- What doesn't need tests
- How to run tests locally
- Coverage expectations

## Domain Vocabulary
A glossary of terms that mean something specific in this codebase. At minimum 5 entries.

## Files and Directories — Handle With Care
An explicit list of paths Claude should not modify autonomously, with a one-line reason for each.

## External Services
APIs, databases, and services Claude interacts with. Flag: which are production-only, which are rate-limited, which calls are irreversible.

## Definition of Done
What "done" means for a task in this codebase. What steps must always happen before a task is considered complete.

Rules for writing this document:
- Every rule must be specific enough that a new engineer would change their behavior after reading it
- No generic advice ("write clean code", "follow best practices") — everything must be codebase-specific
- If you don't have enough information for a section, write exactly what you know and add a `<!-- FILL: explain X -->` comment for the developer to complete
- Aim for 400-800 words. Long enough to be useful, short enough to fit in context without wasting tokens.

Expected Output: A complete, ready-to-use CLAUDE.md file with all eight sections populated.

Output File: output/CLAUDE.md

Output Schema: markdown


monitor_session_learnings

Agent: session_monitor

Description:

(Runs only when mode=update is passed as input)

Read Claude Code session logs from the past 30 days at {session_logs_path}. Identify:

1. Corrections: Any time the human said "no", "that's wrong", "don't do that", "we don't do X here"
2. Re-explanations: Any time the human re-explained something they'd explained in a previous session
3. Footguns: Any time Claude confidently did something that required a revert or human override
4. Domain errors: Any time Claude used wrong terminology or misunderstood a domain concept

For each identified gap, propose a specific CLAUDE.md addition or amendment. Format:

## Proposed Update #{n}

**Section:** [which CLAUDE.md section this belongs in]
**Trigger:** [what session event triggered this — quote the relevant exchange]
**Proposed addition:**

[exact text to add to CLAUDE.md]

**Confidence:** [high / medium / low]
**Frequency:** [how many times this pattern appeared in the last 30 days]

Expected Output: A markdown document with N proposed updates, ordered by frequency descending.

Output File: output/update_suggestions.md

Output Schema: free text markdown


Process

Execution: Process.sequential

Order:

analyze_codebase → conduct_interview → write_claude_md
                                         ↑ (update mode only)
                               monitor_session_learnings

Context chain: write_claude_md receives both analyze_codebase and conduct_interview outputs in its context list.

Tools Required

Tool Used By Purpose
directory_reader codebase_analyst Walk repo tree, detect config files
file_reader codebase_analyst, convention_extractor, session_monitor Read source files, transcripts, logs
grep_tool codebase_analyst Search for patterns, imports, anti-patterns
package_json_parser codebase_analyst Parse dependency versions
git_log_reader codebase_analyst, convention_extractor Analyze commit history and PR patterns
terminal_prompt convention_extractor Interactive Q&A in terminal (interactive mode only)
file_writer claude_md_writer Write output/CLAUDE.md
session_log_parser session_monitor Parse Claude Code session transcripts

Acceptance Criteria

  • Crew completes without agent errors on a real repo (test against at least: a Rails app, a Next.js app, a Python data pipeline)
  • Generated CLAUDE.md passes a blind review: a senior engineer on the target team rates it ≥ 7/10 for accuracy and usefulness
  • CLAUDE.md is between 400-800 words
  • All eight required sections are present and populated (no <FILL> placeholders remain unless information was genuinely unavailable)
  • In interactive mode, interview completes in under 5 minutes
  • In silent mode, crew completes in under 60 seconds
  • codebase_profile.json is valid JSON and passes schema validation
  • Update mode: proposed updates are traceable to specific session events (not hallucinated)

Out of Scope

  • Writing slash commands (separate experiment: 03-zero-to-value-onboarding)
  • Generating .claude/settings.json permission configs
  • Multi-repo / monorepo coordination (single repo only in v1)
  • Automatic commit/PR of CLAUDE.md changes — human must review and apply
  • Real-time session monitoring — update mode is a manual trigger, not a daemon

Open Questions

  • Should the interview be voice-first (speak your answers) or text-only? Voice would reduce friction dramatically.
  • How do we handle CLAUDE.md drift — when the codebase changes but the CLAUDE.md isn't updated? Should we add a staleness score?
  • Privacy: session logs contain proprietary code context. Do we need an on-device-only mode?
  • Should monitor_session_learnings be a separate always-on MCP tool rather than a crew task?

README

src/components/claude/experiments/07-atlassian-sync/README.md
View on GitHub

atlassian-sync MCP

An MCP server that makes Confluence and Jira first-class citizens of your Claude Code workflow. Reference live Atlassian content directly in spec files — Claude resolves it automatically.

Quick Start

pip install -e ".[dev]"
cp .env.example .env          # fill in host + credentials
atlassian-sync                 # starts on http://localhost:8015

Add to Claude Code (~/.claude/mcp.json or project .claude/mcp.json):

{
  "mcpServers": {
    "atlassian-sync": {
      "url": "http://localhost:8015/mcp",
      "headers": { "Authorization": "Bearer YOUR_MCP_API_KEY" }
    }
  }
}

Inline References

Reference live Atlassian content anywhere in your specs or CLAUDE.md:

See @confluence:482934[Auth Design] for the architecture rationale.
Tracked in @jira:AUTH-42[Auth v2 Epic].

Claude resolves these automatically when it reads a file. The full page/ticket content is injected into its context before it starts working.

Auth

Deployment Auth mode Env vars needed
Atlassian Cloud api_token ATLASSIAN_HOST, ATLASSIAN_EMAIL, ATLASSIAN_API_TOKEN
Self-Hosted Data Center pat ATLASSIAN_HOST, ATLASSIAN_PAT

Set ATLASSIAN_AUTH_MODE to api_token or pat. Deployment is auto-detected from the host URL.

Tools (15 total)

Confluenceconfluence_get_page, confluence_search, confluence_get_space, confluence_sync_space, confluence_sync_page

Jirajira_get_issue, jira_search, jira_get_sprint, jira_get_epic, jira_create_comment, jira_transition_issue, jira_update_fields

Syncsync_status, sync_run, resolve_references

Sync to Disk

Pull an entire Confluence space as Markdown files (incremental, ETag-based):

# via MCP tool in Claude
confluence_sync_space(space_key="ARCH")
# writes to docs/confluence-cached/ARCH/

Re-run anytime — only changed pages are re-fetched.

Spec

See spec.md for the full technical specification (spec-015). Roadmap and execution status: roadmap.md, progress.md. Operator notes: docs/runbook.md.

MCP HTTP: POST /mcp requires Authorization: Bearer <MCP_API_KEY> (see .env.example). GET /health stays unauthenticated for health probes.

00 Big Picture

src/components/claude/lessons/00-big-picture.md
View on GitHub

Lesson 00: The Big Picture — A Timeline of Claude as a Dev Tool

← Back to Index | Next: Lesson 01 — Core Mental Model →


TL;DR: Claude went from a chat window you copy-paste from, to a CLI agent that runs your terminal, to an orchestration layer that manages other agents. Most developers are still stuck in 2023 workflows.

Difficulty: [Beginner] | Time to read: 10 min


Era 1 — 2023: The Copy-Paste Phase

Core workflow: Open Claude.ai in a browser tab. Describe a function. Copy the output. Paste into your editor. Debug why it doesn't compile. Repeat.

What developers actually did:

  • Tab-switched between IDE and browser dozens of times per hour
  • Pasted entire files into the chat box to give Claude context
  • Re-explained the same codebase architecture every new conversation
  • Manually applied Claude's suggestions line by line

Main frustration: Cognitive overhead. Every interaction required manually bridging two worlds. You were the API — shuttling context back and forth by hand.

What Claude still sucked at:

  • Understanding your actual codebase (it only saw what you pasted)
  • Maintaining consistency across files (no persistent memory)
  • Knowing your project's conventions, preferences, or patterns
  • Anything requiring multi-step execution

The unlock that moved things forward: Developers started treating Claude less like a search engine and more like a junior developer — giving it more context, not less. Longer context windows (100K tokens) meant you could dump entire files in.


Era 2 — Early 2024: IDE Plugins and API Integrations

Core workflow: Claude embedded directly in editors via extensions (Continue.dev, Cursor, early Copilot alternatives). The context gap started closing — Claude could see open files without copy-paste.

What developers actually did:

  • Used inline chat to ask questions about the file currently open
  • Ran one-off generation tasks from the editor command palette
  • Started using the API to build internal tools and scripts around Claude
  • Experimented with system prompts to encode project context

Main frustration: IDE integrations were shallow — they saw the open file, not the repo. Claude still had no memory of what it did yesterday. Every session was a blank slate.

What Claude still sucked at:

  • Cross-file awareness (couldn't navigate a real codebase)
  • Running code to verify its own output
  • Taking actions (read-only, couldn't write files or run commands)
  • Long-running tasks that required multiple steps

The unlock that moved things forward: The API made Claude programmable. Teams started building internal tools: code review bots, PR summarizers, doc generators. Claude-as-infrastructure began.


Era 3 — Mid 2024: Claude Code Beta — CLI-Native Agentic Coding

Core workflow: claude in your terminal. Claude reads your repo, writes files, runs shell commands, iterates. The developer shifts from doing to reviewing.

What developers actually did:

  • Launched Claude Code from the project root — it could see and modify the entire codebase
  • Delegated multi-step tasks: "add auth to this Express app, write the tests, run them"
  • Started building CLAUDE.md files to give persistent context between sessions
  • Used Plan Mode (Shift+Tab twice) to review Claude's approach before execution

Main frustration: Claude would confidently execute wrong plans. Without the right upfront context, it made decisions that looked reasonable but violated project conventions. The CLAUDE.md was the fix — but writing a good one took real effort.

What Claude still sucked at:

  • Knowing when to stop and ask vs. when to proceed
  • Handling large refactors without losing track of state
  • Integrating with external systems (Jira, Slack, databases) natively

The unlock that moved things forward: The shift to CLI-native meant Claude could actually do things, not just suggest them. The feedback loop compressed from minutes to seconds. CLAUDE.md turned institutional knowledge into a compounding asset.


Era 4 — Late 2024: MCP — Claude Gains Tools

Core workflow: Claude connects to external systems via Model Context Protocol servers. It can now read your database, create Jira tickets, post to Slack, open a browser, and query your analytics — all within a single session.

What developers actually did:

  • Plugged in GitHub MCP: Claude opens PRs, reviews diffs, posts comments
  • Connected database MCPs: Claude queries prod (read-only) during debugging sessions
  • Added Slack MCP: Claude reads thread context to understand what a bug report actually means
  • Built custom MCP servers for internal tools

Main frustration: MCPs blow up the context window fast. Connecting five MCPs and running a complex task could exhaust context before the work was done. You had to be selective about what you enabled per session.

What Claude still sucked at:

  • MCP server stability (early implementations were flaky)
  • Security boundaries — malicious MCP responses could inject instructions
  • Knowing which tools to call without explicit prompting

The unlock that moved things forward: Claude stopped being a coding tool and started being an engineering workflow tool. You could describe a production incident and Claude would pull the Sentry error, read the relevant code, check the deploy history, and draft a fix — without you switching tabs once.


Era 5 — Early 2025: Parallel Sessions, Opus-with-Thinking, CLAUDE.md as Team Infra

Core workflow: Multiple Claude sessions running simultaneously, each on a different task. One refactoring a module, one writing tests, one drafting a PR description. You're a manager now, not a coder.

What developers actually did:

  • Ran 5 terminal sessions + 5-10 web sessions in parallel
  • Named tabs by task: [auth-refactor], [test-coverage], [perf-investigation]
  • Used system notifications as async triggers — Claude pings you when it needs input
  • Checked CLAUDE.md into Git — every teammate's Claude session now starts with shared institutional knowledge
  • Used Opus for complex architectural work, Sonnet for routine tasks

Main frustration: Parallel session management was cognitively demanding. Knowing what each session was doing, when to intervene, how to integrate outputs — that became the new developer skill.

What Claude still sucked at:

  • Coordination across sessions (sessions didn't know what each other were doing)
  • Cost management (Opus × 10 parallel sessions adds up fast)
  • Self-correcting when stuck in a bad plan without human intervention

The unlock that moved things forward: CLAUDE.md in Git meant the team's AI behavior was versionable, reviewable, and improvable. When Claude made a systematic mistake, you updated CLAUDE.md once and fixed it for everyone, forever.


Era 6 — Mid 2025: Agent Teams, Subagents, Autonomous Loops

Core workflow: An orchestrator Claude session spawns specialist subagent sessions. The orchestrator delegates, the subagents execute, verification agents check the output. The human sets the goal and reviews the result.

What developers actually did:

  • Defined orchestrator + specialist architectures for complex tasks
  • Ran overnight autonomous loops on well-specified tasks
  • Used agent-stop hooks to trigger deterministic verification after every task
  • Integrated Claude Code into GitHub Actions — PRs automatically got Claude review

Main frustration: Autonomous loops required near-perfect specs. Underspecified tasks + overnight runs = waking up to confident, wrong work. The quality of your prompts became the bottleneck, not Claude's capability.

What Claude still sucked at:

  • Knowing when it's out of its depth and should stop
  • Managing state across many subagent sessions cleanly
  • Cost predictability in open-ended autonomous tasks

The unlock that moved things forward: Verification loops. Giving Claude a way to test its own work — run the tests, open the browser, query the database — 2-3x'd output quality without additional human review.


Era 7 — Now: Claude as an Engineering Orchestration Layer

Core workflow: You describe outcomes, not steps. Claude plans the work, distributes it across subagents, verifies the results, and surfaces decisions that genuinely require human judgment.

What the best teams are doing today:

  • CLAUDE.md is a living document updated after every significant session
  • Slash commands encode team-specific workflows as reusable primitives
  • Claude Code GitHub Action runs on every PR — human reviewers focus on judgment, not mechanics
  • Custom MCP servers connect Claude to every internal tool
  • New engineers are onboarded to the AI-native workflow on day one

The remaining hard problems:

  • Who is responsible when Claude ships a bug?
  • How do you version AI behavior as models and prompts evolve?
  • How do you prevent skill atrophy in junior developers who never learn the hard way?
  • What work should humans always do themselves?

The Through-Line

Each era solved a different bottleneck:

Era Bottleneck Solved New Bottleneck Created
2023 Speed of generation Context gap (copy-paste)
Early 2024 Context gap No memory, no action
Mid 2024 Memory + action (CLAUDE.md + CLI) Spec quality
Late 2024 External system integration Context window management
Early 2025 Team knowledge sharing Parallel session management
Mid 2025 Verification + autonomy Spec quality at scale
Now Orchestration Human judgment + accountability

The pattern: every time one bottleneck is solved, the constraint moves up the stack — closer to human judgment and further from mechanical execution.

What this means for you: If you're still working like it's 2023 (copy-pasting from a chat window), you're not behind because Claude got smarter. You're behind because the workflow changed. The rest of this guide is about catching up — and then getting ahead.



← Back to Index | Next: Lesson 01 — Core Mental Model →

Related reading