A 12-Step Progression from Beginner to Expert
This roadmap is built from current (May 2026) best practices in the Claude Code ecosystem, including Anthropic’s official guidance, the patterns shared by Boris Cherny (creator of Claude Code), and community-validated workflows. It’s structured as a progression — each step builds on the previous one. Don’t try to adopt them all on day one.
The Three Tiers
TIER 3 — ECOSYSTEM Steps 9–12: Skills, Agents, Plugins, Multi-agent
(You're now extending the platform itself)
──────────────────────────────────────────────
TIER 2 — AUTOMATION Steps 5–8: Code maps, Hooks, MCP, Memory tools
(You're now augmenting Claude with real tools)
──────────────────────────────────────────────
TIER 1 — FOUNDATION Steps 1–4: CLAUDE.md, specs, plans, decisions
(You're working with Claude correctly)
Most users live in Tier 1 forever and watch their context bloat and costs climb. Power users push exploration and specialized work to Tiers 2 and 3, keeping Tier 1 for orchestration and final decisions.
TIER 1 — FOUNDATION
The non-negotiables. Everything else fails without these.
Step 1 — Introducing CLAUDE.md — Your Project’s Memory for Claude
(Already covered in detail.)
The persistent context file Claude auto-loads every session. Root + per-repo. Keep under 500 lines, ideally under 200 — Claude attends ~150 instructions reliably. Use the WHAT/WHY/HOW framework: what to do, why it matters, how to verify.
New since first tutorial: Use CLAUDE.local.md (gitignored) for personal shortcuts, WIP notes, and sensitive paths. Use @imports like @docs/git.md for modularity instead of one giant file.
Step 2 — Writing Spec Files — Your Feature’s Memory for Claude
(Already covered in detail.)
Durable feature requirements in docs/features/<name>/spec.md. Spec first, plan second, code third.
Step 3 — Plan-First Workflow — Making Claude Plan Before It Codes
(Already covered in detail.)
Always demand a written plan before code. Plan Mode (Shift+Tab) makes this structurally enforced. The math: if Claude makes the right call 80% of the time on any single decision, a 20-decision feature gets you 0.8²⁰ ≈ 1% chance of full success. Planning collapses those to ~100% by making the calls upfront.
Step 4 — Capture Decisions and Session Notes
(Already covered in detail.)
The “why” preserved across sessions. Manual but durable. docs/decisions/ and docs/sessions/.
TIER 2 — AUTOMATION
Now you start augmenting Claude with real tools. This is where productivity multiplies.
Step 5 — Codebase Knowledge Graphs (Graphify-class)
What it is: A tool like Graphify that parses your code with Tree-sitter and produces a queryable knowledge graph of your codebase — files, functions, classes, how modules connect. Claude reads a ~9KB markdown summary instead of scanning hundreds of files each session.
Why it matters: Without this, Claude burns ~20,000 tokens per session just orienting itself in a 40-file project. With it, a 9KB report replaces megabytes of source scanning. Real measurements show ~71× token reduction on large codebases.
When to adopt: Within 1–2 weeks of starting a multi-repo project. The value is immediate; the setup is small.
How to set it up:
uv tool install graphifyy(orpipx install graphifyy)cd ~/projects/my-app && graphify install(configures Claude Code)/graphify .in each repo to build the graph- Set up Git hooks:
graphify hook install(auto-rebuilds on commit)
Trade-offs: Graphs go stale if code changes faster than you regenerate. Git hook integration mitigates this. Use --watch mode for live rebuilds during heavy refactoring sessions.
Step 6 — Hooks — Deterministic Enforcement
What hooks are: Scripts that fire automatically at specific lifecycle events — PreToolUse, PostToolUse, UserPromptSubmit, SessionStart, SessionEnd, PreCompact, and 19 more. Hooks close the gap between CLAUDE.md (advisory, ~70% followed) and 100% enforcement.
Why they matter: CLAUDE.md says “please run the linter.” Hooks make running the linter structurally impossible to skip. Use them for anything that must happen every time.
Common patterns:
PostToolUseonWrite→ run linter, return errors to Claude automaticallyPreToolUseonBash→ block dangerous commands (rm -rf, force pushes)PreToolUseonEdit→ block writes tosrc/legacy/ormigrations/SessionEnd→ auto-commit work-in-progress to a branchStop→ trigger test suite, force Claude to continue if it failsPreCompact→ back up the conversation transcript before context resets
Setup: Edit .claude/settings.json directly, or run /hooks interactively. Claude can write hooks for you — try “Write a hook that runs eslint after every file edit.”
Critical rule: Never block file writes mid-plan. It breaks multi-step reasoning because Claude loses track of where it is. Let it finish, then validate via after-action hooks.
When to adopt: Week 2–4. After CLAUDE.md is stable, before you scale to a team.
Step 7 — MCP (Model Context Protocol) — External Tools
What MCP is: A standardized protocol for connecting Claude to external services — databases, GitHub, Jira/Linear, Slack, Sentry, Figma, Notion, and 3,000+ integrations. Each MCP server exposes tools Claude can call.
Why it matters: Without MCP, Claude works against your local filesystem and bash. With MCP, Claude can query your production DB, read Sentry errors, pull Jira ticket details, post to Slack, and integrate Figma designs — all within a single workflow.
High-value MCP servers for coding:
- GitHub — read PRs, file issues, manage branches programmatically
- Linear/Jira — pull ticket details, update status, link commits
- Postgres/MySQL — query the database to understand schemas and data shapes
- Sentry — read recent errors when debugging
- Filesystem (advanced) — access to directories outside the project root
- Playwright — control a browser for testing
Setup: claude mcp add <server-name>. Configurations live in .claude/settings.json or ~/.claude/settings.json.
When to adopt: Week 3–6. Pick one or two MCP servers that match your daily workflow first. Don’t install 20 servers — each one occupies context budget.
Step 8 — AI Memory Tools (agentmemory-class)
What they are: Persistent memory layers (agentmemory, engram, cognee, memsearch) that auto-capture session history via hooks and inject relevant context at the start of the next session. Different from Step 4 (manual notes) — these are automatic.
Why they matter: By month 3 of a project, you’ve made hundreds of decisions across dozens of sessions. Manual notes can’t keep up. Vector search over auto-captured history can.
Top options as of May 2026:
- agentmemory — most-featured MCP server, 51 tools, BM25 + vector + graph hybrid retrieval
- engram — single Go binary, SQLite + FTS5, no embedding costs, syncs across machines
- cognee — knowledge-graph approach, integrates with Claude Code lifecycle hooks
- memsearch — markdown as source of truth, Milvus as shadow index
- Anthropic Managed Agents Memory — official, public beta since April 2026
When to adopt: Month 2–3, after Step 4’s pain becomes obvious. Pick based on actual pain: vector search needed? agentmemory. Zero cost / local-first? engram. Cross-tool sharing (Claude + Cursor + Codex)? memsearch.
Critical: Keep doing Step 4 manually for important decisions. The memory tool captures the firehose; the markdown captures the milestones. Both layers serve different purposes.
TIER 3 — ECOSYSTEM EXTENSIONS
Now you’re shaping Claude Code itself to fit your workflow.
Step 9 — Skills — Reusable Workflows and Domain Expertise
What skills are: Folders with a SKILL.md file (plus optional helper scripts) that define reusable behaviors. Frontmatter controls invocation — auto-invoke based on task context, manual /skill-name, or both. Progressive disclosure means Claude scans only ~100 tokens per skill until it determines one is relevant, then loads the full instructions.
Two types:
- Capability Uplift skills — give Claude new abilities (Firecrawl for web scraping, Document skills for real .docx/.xlsx files, Webapp Testing for browser automation)
- Encoded Preference skills — make Claude execute the way you want (Frontend Design, React Best Practices, your team’s code review checklist)
Why they matter: If you ever wrote the same instructions to Claude twice, that should have been a skill the first time. Skills kill prompt duplication and stay token-efficient via progressive disclosure.
Setup:
.claude/skills/code-review/
├── SKILL.md ← YAML frontmatter + instructions
└── checklist.md ← supporting file, loaded only when needed
Quick wins to install:
- Superpowers (obra/superpowers) — most complete multi-agent dev workflow as a skill, 40.9k GitHub stars
- Anthropic Document skills — real .docx/.xlsx/.pptx/.pdf manipulation
- Skill Creator — interactive Q&A to build your own custom skills
When to adopt: Month 1–2. Start with one official skill that matches your daily work; build custom skills as you notice repetition.
Step 10 — Subagents — Isolated, Specialized Workers
What subagents are: Specialized AI instances that run in their own context window with their own tool permissions, system prompt, and (optionally) a different model. They do focused work and return only a summary to your main conversation. Stored as markdown files in .claude/agents/.
Why they matter: Two big benefits — context isolation (the subagent’s 50 file reads don’t bloat your main session) and specialization (a code-reviewer can have read-only tools and a senior-engineer persona).
Mental model:
Skills are knowledge. Subagents are workers. Your main session is the orchestrator.
Common subagents:
- code-reviewer — read-only, reviews diffs against best practices
- explore — heavy file-reading agent, returns a summary
- test-runner — runs the test suite, returns only failing tests
- db-reader — read-only DB queries with hook-enforced SQL validation
- security-reviewer — auto-invoked on PRs touching auth or payments
Setup:
---
name: code-reviewer
description: Reviews code for quality, security, maintainability. Use after writing code.
tools: Read, Grep, Glob, Bash
model: sonnet
---
You are a senior code reviewer ensuring high standards...
Critical-fixer pattern: Two subagents in a loop — one critic (read-only, no fix incentive), one fixer (can edit). The critic re-audits after each fix. Prevents Claude’s “looks good to me” problem.
When to adopt: Month 1–2 alongside Skills. They solve different problems but compose well.
Step 11 — Multi-Agent Orchestration (Agent Teams)
What it is: Released February 2026 — describe roles and tasks in a prompt, and a lead agent orchestrates a team of teammates that inherit permissions and MCP connections. Configured through the prompt itself, not YAML.
Why it matters: For genuinely parallel work — three agents arguing different methodologies on a design problem, parallel refactors across modules, simultaneous research on multiple topics. The lead synthesizes their outputs.
Common patterns:
- Methodological debate — Agent A argues for approach X, Agent B for approach Y, lead synthesizes
- Parallel exploration — One agent per repo in a multi-repo system, lead merges findings
- Producer/Critic — Agent generates, critic audits, lead arbitrates
When to adopt: Month 3+, after you’ve mastered single-subagent patterns. Don’t reach for agent teams to solve problems a single well-scoped subagent can handle.
Step 12 — Plugins — Packaging and Sharing
What plugins are: Released October 2025 — packaged bundles that can include MCP servers, skills, subagents, and hooks. Distribute via /plugin marketplace add <github-repo> and /plugin install <name>.
Why they matter: Once your team has a working setup, plugins are how you standardize it. New teammates install one plugin and inherit the entire workflow — same skills, same hooks, same subagents, same MCP connections.
High-value community plugins:
- dev-workflows (shinpr) — end-to-end plan/execute/verify workflows
- Superpowers (obra) — TDD-enforced multi-agent dev lifecycle
- Anthropic official skills marketplace — document skills, webapp testing, etc.
When to build your own: Once your .claude/ folder has 3+ skills, 2+ subagents, and a few hooks that work well — package them so the rest of your team can install with one command.
When to adopt: Month 6+, once your patterns have stabilized. Building a plugin too early packages bad habits.
The Layering Principle
This is the most important meta-lesson from the entire ecosystem. From the Claude Code team’s own guidance:
Skills define what to do. MCP provides the data. Subagents and Agent Teams handle delegation. Hooks enforce the rules. Plugins package it all.
Most people use one or two features without seeing how they stack. The real power is composing them:
Example — a security review workflow:
- Hook (PostToolUse) detects an edit to
src/auth/ - Subagent (
security-reviewer) is auto-invoked with read-only tools - Skill (
security-checklist) loads the team’s specific audit rules - MCP (Sentry) pulls recent auth-related errors as context
- Memory tool injects past auth decisions from previous sessions
- Plugin packages all of this for the rest of the team
That’s all six pieces composing into one automated workflow. None of them alone would catch a real security issue. Together, they make it nearly impossible to miss one.
Adoption Timeline
Realistic, no-rush:
| Time | Focus |
|---|---|
| Day 1 | Step 1 (CLAUDE.md) + Step 2 (spec.md) |
| Week 1 | Step 3 (Plan-First) + Step 4 (Decisions/Sessions) |
| Week 2 | Step 5 (Graphify) — only if multi-repo |
| Week 3–4 | Step 6 (Hooks) — start with one PostToolUse for linting |
| Week 4–6 | Step 7 (MCP) — install GitHub or your issue tracker |
| Month 2 | Step 9 (Skills) — install Superpowers or build your first custom skill |
| Month 2 | Step 10 (Subagents) — create code-reviewer and test-runner |
| Month 3 | Step 8 (AI memory tool) — only if pain is real |
| Month 3+ | Step 11 (Agent Teams) — when single subagents aren’t enough |
| Month 6+ | Step 12 (Plugins) — only after your patterns stabilize |
Anti-pattern: Adopting Tier 3 before Tier 1 is solid. A skill referencing a sloppy CLAUDE.md just makes the sloppiness reusable.
What Separates Experts from Beginners
After reading dozens of practitioner reports and Anthropic’s own guidance, here’s what consistently separates the top 1% of Claude Code users from everyone else:
- They treat planning as 50% of the work, not a preamble. Two hours on a spec recovers 6–10 hours of implementation time on a 12-step feature.
- They keep CLAUDE.md under 200 lines and update it monthly. Every time Claude makes the same mistake twice, a new rule goes in. Boris Cherny calls this “compound engineering.”
- They use hooks for anything that must always happen. Linting, formatting, security checks, branch policies — all hooks, never CLAUDE.md instructions.
- They push exploration to subagents to keep main context clean. A 30-file investigation in a subagent returns a 200-word summary, not 30 files of noise.
- They write tests before, not after. Tests are an external source of truth that stays accurate as context fills up. Without them, Claude’s self-assessment degrades.
- They break PRDs into vertical slices, not horizontal phases. AI defaults to horizontal (DB phase, then API phase, then frontend phase) which delays end-to-end feedback. Vertical slices ship a thin user-visible feature at every step.
- They challenge Claude’s work. “Grill me on these changes — don’t make a PR until I pass your test.” and “Knowing everything you know now, scrap this and implement the elegant solution” are go-to prompts.
- They run multiple Claude sessions in parallel via git worktrees for isolated experiments without edit collisions.
Closing Note — On What I Got Wrong Earlier
Earlier in this conversation, I gave you Steps 1–4 as if that were the whole picture. It wasn’t. Those are the foundation — necessary but not sufficient for what you asked for, which is deep expertise.
The full picture is what’s above. Steps 1–4 are real and important; they’re just the entry to a much bigger ecosystem of skills, subagents, hooks, MCP servers, memory tools, code graphs, and plugin distribution. The Claude Code platform now spans 25 hook lifecycle events, supports plugins, agent teams, and an MCP ecosystem of 3,000+ integrations. Treating it as just “write good prompts” is missing 80% of the leverage.
Apologies for not framing it this way from the start. This roadmap is the complete answer to your original question.
Next Steps for You
Don’t try to do all 12 at once. The honest advice:
- This week — finish Tier 1 (you’re on Step 4 already). Get the foundation solid.
- Next 2 weeks — pick one Tier 2 tool that addresses your biggest pain. For multi-repo, that’s Graphify (Step 5). For “Claude keeps doing the wrong thing despite my CLAUDE.md,” that’s Hooks (Step 6).
- Next month — add Skills (Step 9) and Subagents (Step 10). These are the highest-leverage Tier 3 additions.
- Defer the rest until you’ve felt the specific pain each one solves.
Want me to write detailed 2-page tutorials (like Steps 1–4) for any specific later step? Just tell me which one you want next.