AI Champion Roadmap

A structured learning plan to become more effective with AI coding agents — Claude Code, Gemini CLI, and GitHub Copilot CLI.

Claude Code Gemini CLI Copilot CLI
Level 0

Foundation: AI Resources Repository

One git repo that survives machine changes and holds every config, skill, hook, and learning.

0 / 5
  • Every config, skill, hook, and learned preference lives here. Without a canonical repo, improvements stay trapped on one machine — a wipe or new laptop means starting from scratch.

  • The setup script reads keys from `.env` and injects them into MCP configs via envsubst. Without this, none of the tools that require API access will connect.

  • The defaults are generic. Tailoring your context files to your actual workflow is what separates 'agent following generic instructions' from 'agent following your instructions'.

  • Manual symlinking across machines is error-prone and slow. The idempotent setup script means any machine — new laptop, remote server — goes from zero to your full configuration in under a minute.

  • A repo that only lives locally is a backup risk. The whole point of this setup is that a new machine becomes productive in minutes — verify it now while the steps are fresh.

Level 1

Context Engineering

Each tool's default behavior matches your preferences without prompting on every session.

0 / 9
  • Seeing the format before writing your own prevents common mistakes — wrong heading levels, missing frontmatter, rules that contradict each other.

  • Every line is injected into every session. A 200-line CLAUDE.md consumes ~4,000 tokens before you type a word. Length is actively counterproductive.

  • A monorepo frontend and backend rarely share the same conventions. Subdirectory files scope rules to where they're actually relevant, keeping context lean everywhere else.

  • Copilot doesn't lazy-load rules — the full file is sent on every message. Every sentence that isn't directly useful is dead weight in every single conversation.

  • Without this setting, Copilot ignores your agent profile files. One flag enables the entire scoped-agent system.

  • Without a project-level file, Gemini starts each session with no knowledge of your conventions. The global file prevents you from repeating the same preferences in every chat.

  • The single highest-leverage rule. 'No filler' eliminates preamble, summaries, and apologies — the agent gets to the point faster on every single response.

  • Corrections you type once should never need to be typed again. Each rule added is a permanent improvement to every future session.

  • Rules that never fire are noise that costs tokens. Pruning keeps the file lean and ensures every remaining rule is earning its place.

Level 2

Compounding Engineering

Each cycle of work improves future cycles. 80% planning and review, 20% execution.

0 / 9
  • The plugin is what makes the /ce commands available. Without it, /ce:plan is just text with no behavior attached.

  • Same capability as the Claude install — compound engineering works across tools. Don't skip it because you also use Claude.

  • Gives Copilot the same Plan → Delegate → Assess → Codify loop. Cross-tool consistency means you don't have to think differently depending on which agent you're using.

  • Jumping straight to /ce:work without a plan is the most common source of mid-task pivots. Five minutes of planning eliminates hours of correction.

  • Worktree isolation means a broken implementation doesn't contaminate your main branch. Task tracking means you can hand off mid-session without losing context.

  • The agent that implemented the code will rationalize its own decisions. A review pass from a fresh context catches what the implementing session missed.

  • This is what separates compound engineering from regular AI use. Without /ce:compound, each task is isolated. With it, the lesson from today becomes a rule that prevents the same mistake tomorrow.

  • Learnings that stay only in your local session are lost the next time you open a new chat. Committed to the repo, they travel to every machine and persist forever.

  • The loop compounds in value — the third task benefits from two rounds of learning. Running it once gives you a taste; running it three times shows you the actual leverage.

Level 3

Skills and Extensions

The agent gains specialized capabilities without permanently consuming context.

0 / 12
  • A skill you write is a procedure you never have to explain again. The description field is critical — Claude uses it to decide whether to load the skill, so precision there is routing logic, not documentation.

  • A skill that only exists in one project is inaccessible from all other projects. The ai-kit repo is what makes a skill globally available.

  • All four major tools recognize AGENTS.md. Writing one file gives every agent the same baseline context without duplicating effort across tool-specific files.

  • Writing a good SKILL.md is non-obvious — the description routing, the trigger conditions, the success criteria all have specific patterns. skill-creator teaches the format by example.

  • The single fastest way to see what Level 5 agent behavior looks like in practice. Watching the brainstorm → plan → implement → test loop run on a real task is more instructive than reading about it.

  • Skills already written by practitioners who solved the same problems you have. Finding two or three relevant ones on day one is faster than writing them from scratch.

  • Without a baseline score, you don't know which capability gaps matter most. /harness-audit tells you where you are on the maturity curve before you start adding more.

  • The same planning and execution discipline from Superpowers, applied to Gemini CLI sessions. Same methodology, different runtime.

  • 175+ agents and 208+ skills the Copilot community has already built. Browsing it before writing your own prevents reinventing what already exists.

  • Agent profiles let you create named specialists — a 'code reviewer', a 'test writer', a 'PR describer' — each with their own scoped system prompt. Without them, every Copilot conversation starts from the same generic baseline.

  • Extensions are how Gemini gets persistent, reusable behavior. Without them, every Gemini session is stateless and context-free beyond what GEMINI.md provides.

  • Sentry's PR review skill is the clearest real-world example of the orchestrator → parallel subagents → aggregation pattern. Reading the implementation is faster than discovering the pattern yourself.

Level 4

MCP Servers

The agent can search the web and reach external systems. MCP context cost is managed.

0 / 10
  • Without search, the agent works from training data alone — no awareness of libraries released after its cutoff, no ability to look up current documentation, no real-time context. Search is the single biggest capability upgrade per minute of setup time.

  • Pulling a library's docs into context manually is slow and consumes tokens. DeepWiki provides structured access to any open-source repo's documentation on demand, with no manual copying.

  • For anything where X/Twitter signal matters — trending libraries, community sentiment, real-time debates — Grok has unique access. The $175/month free tier makes it effectively zero cost.

  • Keys hardcoded in config files get committed to git. The template + envsubst pattern keeps keys out of version control while making configs reproducible across machines.

  • Verifying search works before relying on it in a real task prevents the frustrating experience of discovering a broken MCP mid-session.

  • Most practitioners are surprised the first time they run /context — 30–40% consumed before typing a word is common. You cannot optimize what you cannot see.

  • Every mounted MCP injects its full tool schemas on every turn whether Claude uses them or not. Three idle MCPs can consume 10–15% of your context window before any code appears.

  • Gemini's extension system cannot run arbitrary hooks, but it can block dangerous operations. Excluding destructive commands is the closest equivalent to Claude's PostToolUse safety checks.

  • Knowing that Copilot's MCP support is VS Code-only prevents wasted time trying to configure it from the CLI. Use VS Code settings for Copilot MCP, CLI for Claude and Gemini.

  • Every new session starts blank — the agent has no memory of your past decisions, preferences, or architectural choices. Persistent memory closes this gap and compounds over time.

Level 5

Prompting Discipline

Better first responses and fewer correction loops.

0 / 6
  • Context transforms output. 'Refactor this function' produces a generic refactor. 'Refactor this function because it's being called from three places with inconsistent error handling' produces a targeted, correct one. The word 'because' forces the relevant context into the prompt.

  • Agents jump to implementation by default and bypass the exploratory thinking that surfaces real constraints. Plan Mode enforces a deliberate pause before any file is touched.

  • Conversational corrections ('no, I meant...') are imprecise and add conversation history the agent must reason through. Inline annotations in the spec file are precise and unambiguous.

  • The assumptions the agent made during a long session are invisible unless you ask. The assumptions section of a summary surfaces things you'd otherwise discover three days later when something breaks.

  • The implementing agent rationalizes its own decisions. A fresh context — even the same model — will find issues the implementing session glossed over. This is not optional for anything going to production.

  • The Pandya power prompt demonstrates that a single paragraph of intent can orchestrate parallel agents, update CLAUDE.md, and create a reusable skill — all from one message. Seeing it run on your own codebase makes the capability concrete.

Level 6

Token Management and Observability

Know what the context window contains. Stop burning tokens on noise.

0 / 13
  • Context exhaustion sneaks up on you. Without visibility, you discover the window is full after the agent has already started a complex task and has to restart. claude-hud makes this visible before it becomes a problem.

  • The agent and todo lines are off by default but are the most useful for agentic sessions — they show which subagent is running, what it's doing, and how many tasks remain. The context bar alone is only half the picture.

  • Knowing how much context is pre-consumed before any task starts is essential for planning session length. If 30% is gone before you type, you have a different budget than if 5% is gone.

  • A full context window causes the model to lose track of early instructions. /compact preserves the essential state while freeing up room — like a mid-session memory consolidation.

  • Gemini has a large context window but it's not infinite. Treating it as unlimited leads to context bloat that degrades attention to early instructions. Fresh sessions for new topics is the discipline that prevents this.

  • Without a context bar, the only signal that a session is overloaded is degraded output quality — which you only notice after the damage is done. Proactive session hygiene prevents this.

  • Token costs compound quickly in agentic sessions. RTK can reduce per-session token usage by 60–90% on shell output alone — which directly reduces cost and extends effective context budget.

  • The hook must be registered for RTK to intercept Claude Code's shell commands. Installation without the hook does nothing.

  • Same as the Claude hook — Gemini CLI uses a different hook registration path. Both must be set up separately.

  • The Codex hook target covers Copilot CLI sessions. Without this step RTK is installed but not active.

  • The savings number is usually surprising and motivating. Seeing the actual token reduction makes the tradeoff real and encourages keeping the tool active.

  • Training data has a cutoff. Last30days gives the agent real-time community intelligence — what practitioners are actually saying, what tools are winning, what's breaking — from the past 30 days.

  • The Polymarket section is the part you won't get from a search engine: real-money probability estimates on tech outcomes. It's a unique signal that sits alongside community sentiment.

Level 7

Hooks and Persistent Memory

The agent detects its own errors without prompting. Sessions start with context from previous ones.

0 / 19
Backpressure / Hooks
  • Without a feedback signal, the agent writes broken code, you point it out, it fixes it — three turns per error. A lint hook fires automatically and the agent self-corrects in one turn. This is the difference between an assistant and an autonomous worker.

  • Setting up a hook that silently fails is worse than no hook — you think you have backpressure when you don't. Verification is the only way to confirm the feedback loop is actually closed.

  • A 10-second hook runs on every single tool call. In a 50-call session that's 8 minutes of waiting. Slow hooks don't just feel bad — they change the economics of agentic sessions.

  • You cannot improve what you cannot measure. Rudel makes token usage, session patterns, and model costs visible across sessions — turning a gut feeling about AI productivity into data.

  • Gemini has no PostToolUse hook system. Wrapping your linter as an explicit tool is the closest equivalent — the agent calls it deliberately instead of automatically, but the feedback loop is still there.

  • An agent with unrestricted shell access can run rm -rf with no confirmation. excludeTools is a hard block, not a prompt-level request — it prevents the action before it reaches the model.

  • Copilot has no in-session hooks. Git-level pre-commit hooks are the next best thing — they catch errors before they're committed, even if they don't close the loop mid-session.

Persistent Memory
  • Every correction you give the agent teaches it something — but only for that session. pi-self-learning captures those corrections automatically and builds a corpus of durable preferences across sessions.

  • A month's worth of captured corrections shows you your most common failure modes with this agent. The top 3 almost always belong in CLAUDE.md.

  • CORE.md committed to your repo travels to every machine. Without this step, the memory stays on one machine and is gone after a wipe.

  • pi-self-learning gives you distilled lessons. mem0 gives you raw context retrieval — 'what did I decide about the auth architecture last month?' Combining both covers the full memory problem.

  • mem0 doesn't save automatically unless instructed. This prompt at the end of a session is the trigger that makes the memory layer useful. Without it, sessions remain isolated.

  • Memory that's stored but never retrieved is worthless. This prompt at the start of a relevant session is what makes past context actually influence the current one.

  • Without an automatic capture hook, memory must be maintained manually. This item is the acknowledgment that the discipline of updating context files after each session is the substitute for automation.

  • Verification confirms the memory layer is actually working — not just installed. If the agent can't answer 'what do I usually prefer for X?' after two weeks, something in the pipeline is broken.

Sandbox Safety
  • Real incidents motivate jai — agents running rm -rf, overwriting uncommitted work, making destructive changes in the wrong directory. A sandbox contains the blast radius before it becomes a real loss.

  • A new skill runs arbitrary shell commands. Running it unsandboxed the first time is accepting unknown risk. jai casual mode is a 10-second safety wrapper with no overhead.

  • Long autonomous tasks accumulate risk — the agent makes many decisions without review. Strict mode means even if something goes wrong deep in the task, your actual home directory is unaffected.

  • Without jai, a git worktree is the next best isolation. A broken implementation in an isolated worktree cannot contaminate main, and the worktree can be deleted without consequence.

Level 8

Remote Setup

An always-on agent server reachable from anywhere. No laptop dependency.

0 / 10
  • A €4/month server is the cost of one coffee. The return is an always-on agent that can run multi-hour tasks while you sleep, accessible from any device, with no laptop dependency.

  • A default Ubuntu install with password auth enabled is a credential stuffing target within hours. These three hardening steps eliminate the most common attack vectors before you put API keys on the machine.

  • OpenCode runs on any model provider — Claude, Gemini, local Ollama models. Installing it on the server makes the server model-agnostic, so switching providers doesn't require reprovisioning.

  • Keys in the environment file are readable only by the dev user. This is more secure than passing them as arguments or hardcoding them in config, and simpler than a secrets manager for a personal server.

  • Without tmux, an SSH disconnect kills every running agent. Persistent sessions mean you can start a 3-hour research task, close your laptop, and reconnect hours later to review the results.

  • Headless mode with systemd means the agent restarts automatically after a server reboot, without manual SSH intervention. This is what makes the server genuinely always-on rather than on-until-the-next-restart.

  • Your skills, CLAUDE.md, and MCP config on the server should be identical to your local setup. Cloning ai-kit and running setup.sh on the server makes this a one-command operation.

  • A public-facing SSH port is a credential brute-force target. Tailscale puts the server on a private network — only devices you own can connect, with no firewall rules to maintain.

  • New machines and server reprovisioning happen. Having setup-remote.sh and update-remote.sh in the repo means future setup is a single command, not a reconstruction from memory.

  • Kicking off a task from your phone proves the setup actually works end-to-end: SSH from a different device, persistent session, agent running independently. Until you do this, you have a server — not an autonomous agent.

Level 9

Parallel Agents

Multiple agents working on independent tasks simultaneously.

0 / 18
  • Single-agent sessions serialize everything. Agent teams break the serialization — independent tasks run in parallel, and the total wall-clock time for a multi-task sprint drops proportionally.

  • Superset handles the orchestration overhead — worktree creation, agent coordination, diff viewing — that makes parallel agents practical rather than just theoretically possible.

  • Prompting for a complex feature produces a complex context that grows until attention degrades. The FD system uses a written spec instead — the agent works from a document, not from conversation history, which scales.

  • Four parallel Opus agents on a hard problem produce a solution that reflects four independent lines of reasoning, then converges. The quality is qualitatively different from a single agent working the same problem serially.

  • The Dispatch skill gives the orchestrator pattern without any additional tooling — the skill itself handles task decomposition, parallel delegation, and result aggregation from within a single session.

  • Different models have measurably different strengths. Routing implementation to the model with the best reasoning and review to the model with the best error-finding produces better combined output than using one model for both.

  • Every agent added beyond 4–6 adds coordination overhead that can exceed the parallelism gain. Knowing the ceiling prevents the mistake of assuming more agents always means faster output.

Path A — Long-horizon tasks (DeerFlow)
  • DeerFlow handles the orchestration layer so you don't have to — sub-agent spawning, context scoping, result synthesis. Deploying it once gives you a long-horizon research and execution capability on demand.

  • A 30-minute task you'd do manually is the right calibration — concrete enough to evaluate quality, long enough to see the multi-agent orchestration working.

  • Using the output without rewriting it is the signal that DeerFlow is producing real value, not just activity. If you rewrite everything, the bottleneck is still you.

Path B — Persistent personal agent (Hermes)
  • Hermes builds a model of your preferences across sessions. The learning loop only has value if it runs for long enough to accumulate meaningful signal — installing it is the prerequisite.

  • A messaging gateway means you can delegate to Hermes from your phone, get results in Telegram, and never need to open a terminal. The channel is what makes it a daily driver rather than a dev tool.

  • 10 sessions is the minimum before the skill accumulation becomes visible. The first few sessions feel like a regular agent — the difference emerges with repeated use.

  • A correctly anticipated unstated preference is the proof that the agent model is working — it extracted a pattern from your behavior that you never explicitly stated.

Path C — Team-scale async agent (Open SWE)
  • Open SWE is not a tool you install in an afternoon — it's infrastructure. Confirming the prerequisites exist before starting prevents a half-deployed system.

  • The deployment step is where most teams stall. Modal for the sandbox and LangGraph Cloud for orchestration are both services you need accounts for. Plan a day for initial setup.

  • The first real issue is the proof of concept. Seeing the agent work asynchronously — you file an issue, go do other things, come back to a draft PR — demonstrates the async value proposition concretely.

  • A meaningful draft PR requiring minimal rework means the agent understood the requirement, implemented it correctly, and handled edge cases. This is the bar that distinguishes a useful async agent from an expensive autocomplete.

Reference

Anti-Patterns Checklist

Review periodically. These are the failure modes.

0 / 5
  • Comprehension debt is different from technical debt — it breeds false confidence. You think you understand the system because it works, but when it breaks you have no mental model to debug from.

  • Every token in your context file fights for attention against every other token. The longer it gets, the less any individual rule is weighted. Density beats length every time.

  • The session that implemented the code rationalized every decision it made. A fresh context — same model, new session — will find issues that the implementing session had every incentive to overlook.

  • Zero overrides sounds like AI success. It's often AI capture — the humans stopped checking. What matters is not the rate but the reason. Are you not overriding because everything is correct, or because you stopped looking?

  • Mo Bitar vibe-coded for two years and went back to writing by hand. The long-term costs — inability to debug, loss of system intuition, increasing rework — emerge slowly and are expensive to reverse.