AI Champion Roadmap
A structured learning plan to become more effective with AI coding agents — Claude Code, Gemini CLI, and GitHub Copilot CLI.
-
Every config, skill, hook, and learned preference lives here. Without a canonical repo, improvements stay trapped on one machine — a wipe or new laptop means starting from scratch.
git clone https://github.com/urbanisierung/ai-kit ~/github.com/urbanisierung/ai-kit # Or fork it first on GitHub, then clone your fork -
The setup script reads keys from `.env` and injects them into MCP configs via envsubst. Without this, none of the tools that require API access will connect.
cp .env.example .env # Edit .env — fill in: # ANTHROPIC_API_KEY=sk-ant-... # BRAVE_API_KEY=BSA_... # XAI_API_KEY=xai-... # MEM0_API_KEY=m0-... -
The defaults are generic. Tailoring your context files to your actual workflow is what separates 'agent following generic instructions' from 'agent following your instructions'.
Edit `claude/CLAUDE.md.global`, `copilot/copilot-instructions.md`, and `gemini/GEMINI.md.global`. Start with communication style (terse vs. detailed) and any non-obvious rules about your stack. Keep each file under 100 lines.
-
Manual symlinking across machines is error-prone and slow. The idempotent setup script means any machine — new laptop, remote server — goes from zero to your full configuration in under a minute.
bash tools/setup.sh # What it does: # → Links claude/CLAUDE.md.global → ~/.claude/CLAUDE.md # → Links claude/skills/ → ~/.claude/skills/ # → Writes ~/.claude/mcp.json from template (keys injected from .env) # → Links gemini/GEMINI.md.global → ~/.gemini/GEMINI.md # → Sources dotfiles/.zshrc.ai from ~/.zshrc -
A repo that only lives locally is a backup risk. The whole point of this setup is that a new machine becomes productive in minutes — verify it now while the steps are fresh.
# On a second machine (or after wiping ~/.claude): git clone [email protected]:yourname/ai-kit.git cp ai-kit/.env.example ai-kit/.env # fill in real keys bash ai-kit/tools/setup.sh
-
Seeing the format before writing your own prevents common mistakes — wrong heading levels, missing frontmatter, rules that contradict each other.
-
Every line is injected into every session. A 200-line CLAUDE.md consumes ~4,000 tokens before you type a word. Length is actively counterproductive.
-
A monorepo frontend and backend rarely share the same conventions. Subdirectory files scope rules to where they're actually relevant, keeping context lean everywhere else.
-
Copilot doesn't lazy-load rules — the full file is sent on every message. Every sentence that isn't directly useful is dead weight in every single conversation.
-
Without this setting, Copilot ignores your agent profile files. One flag enables the entire scoped-agent system.
-
Without a project-level file, Gemini starts each session with no knowledge of your conventions. The global file prevents you from repeating the same preferences in every chat.
-
The single highest-leverage rule. 'No filler' eliminates preamble, summaries, and apologies — the agent gets to the point faster on every single response.
-
Corrections you type once should never need to be typed again. Each rule added is a permanent improvement to every future session.
When the agent does something wrong and you correct it mid-session, that correction is ephemeral — it only affects the current context window. The moment the session ends, the agent reverts. The fix: immediately after correcting the agent, ask it to write the rule into your context file. Example: you told the agent 'don't wrap every function in a try/catch, only validate at system boundaries.' That's three sessions of repeated corrections or one line in CLAUDE.md.
# Example: you just corrected the agent for over-wrapping in try/catch # 1. Ask the agent to encode the rule: "Add a rule to CLAUDE.md: never wrap internal function calls in try/catch. Only validate at system boundaries (user input, external APIs)." # 2. The agent appends to CLAUDE.md: ## Rules - Never wrap internal function calls in try/catch. Only validate at system boundaries (user input, external APIs). # 3. Every future session starts with that rule already loaded. # You will never type that correction again. -
Rules that never fire are noise that costs tokens. Pruning keeps the file lean and ensures every remaining rule is earning its place.
-
The plugin is what makes the /ce commands available. Without it, /ce:plan is just text with no behavior attached.
/plugin marketplace add EveryInc/compound-engineering-plugin /plugin install compound-engineering -
Same capability as the Claude install — compound engineering works across tools. Don't skip it because you also use Claude.
bunx @every-env/compound-plugin install compound-engineering --to gemini -
Gives Copilot the same Plan → Delegate → Assess → Codify loop. Cross-tool consistency means you don't have to think differently depending on which agent you're using.
bunx @every-env/compound-plugin install compound-engineering --to copilot -
Jumping straight to /ce:work without a plan is the most common source of mid-task pivots. Five minutes of planning eliminates hours of correction.
-
Worktree isolation means a broken implementation doesn't contaminate your main branch. Task tracking means you can hand off mid-session without losing context.
-
The agent that implemented the code will rationalize its own decisions. A review pass from a fresh context catches what the implementing session missed.
-
This is what separates compound engineering from regular AI use. Without /ce:compound, each task is isolated. With it, the lesson from today becomes a rule that prevents the same mistake tomorrow.
-
Learnings that stay only in your local session are lost the next time you open a new chat. Committed to the repo, they travel to every machine and persist forever.
-
The loop compounds in value — the third task benefits from two rounds of learning. Running it once gives you a taste; running it three times shows you the actual leverage.
-
A skill you write is a procedure you never have to explain again. The description field is critical — Claude uses it to decide whether to load the skill, so precision there is routing logic, not documentation.
-
A skill that only exists in one project is inaccessible from all other projects. The ai-kit repo is what makes a skill globally available.
-
All four major tools recognize AGENTS.md. Writing one file gives every agent the same baseline context without duplicating effort across tool-specific files.
-
Writing a good SKILL.md is non-obvious — the description routing, the trigger conditions, the success criteria all have specific patterns. skill-creator teaches the format by example.
/plugin install skill-creator -
The single fastest way to see what Level 5 agent behavior looks like in practice. Watching the brainstorm → plan → implement → test loop run on a real task is more instructive than reading about it.
/plugin install superpowers@claude-plugins-official -
Skills already written by practitioners who solved the same problems you have. Finding two or three relevant ones on day one is faster than writing them from scratch.
-
Without a baseline score, you don't know which capability gaps matter most. /harness-audit tells you where you are on the maturity curve before you start adding more.
/harness-audit -
The same planning and execution discipline from Superpowers, applied to Gemini CLI sessions. Same methodology, different runtime.
-
175+ agents and 208+ skills the Copilot community has already built. Browsing it before writing your own prevents reinventing what already exists.
-
Agent profiles let you create named specialists — a 'code reviewer', a 'test writer', a 'PR describer' — each with their own scoped system prompt. Without them, every Copilot conversation starts from the same generic baseline.
-
Extensions are how Gemini gets persistent, reusable behavior. Without them, every Gemini session is stateless and context-free beyond what GEMINI.md provides.
-
Sentry's PR review skill is the clearest real-world example of the orchestrator → parallel subagents → aggregation pattern. Reading the implementation is faster than discovering the pattern yourself.
Sentry's internal skills cover code review, commit generation, security audits, Django performance, and brand guidelines. The PR review skill is particularly worth studying: one orchestrating skill spawns multiple focused subagents (database safety, complexity analysis, prompt health, linting) and aggregates their results.
-
Without search, the agent works from training data alone — no awareness of libraries released after its cutoff, no ability to look up current documentation, no real-time context. Search is the single biggest capability upgrade per minute of setup time.
# Claude Code claude mcp add brave-search -e BRAVE_API_KEY=BSA_YOUR_KEY -- npx -y @modelcontextprotocol/server-brave-search # Gemini CLI — add to ~/.gemini/settings.json # { "mcpServers": { "brave-search": { "command": "npx", "args": ["-y", "@modelcontextprotocol/server-brave-search"], "env": { "BRAVE_API_KEY": "BSA_..." } } } } -
Pulling a library's docs into context manually is slow and consumes tokens. DeepWiki provides structured access to any open-source repo's documentation on demand, with no manual copying.
{ "mcpServers": { "deepwiki": { "command": "npx", "args": ["-y", "@deepwiki/mcp"] } } } -
For anything where X/Twitter signal matters — trending libraries, community sentiment, real-time debates — Grok has unique access. The $175/month free tier makes it effectively zero cost.
Opt into the data-sharing program for $175/month of free credits.
# Get the MCP from github.com/merterbak/Grok-MCP # Add XAI_API_KEY to your .env -
Keys hardcoded in config files get committed to git. The template + envsubst pattern keeps keys out of version control while making configs reproducible across machines.
-
Verifying search works before relying on it in a real task prevents the frustrating experience of discovering a broken MCP mid-session.
-
Most practitioners are surprised the first time they run /context — 30–40% consumed before typing a word is common. You cannot optimize what you cannot see.
-
Every mounted MCP injects its full tool schemas on every turn whether Claude uses them or not. Three idle MCPs can consume 10–15% of your context window before any code appears.
-
Gemini's extension system cannot run arbitrary hooks, but it can block dangerous operations. Excluding destructive commands is the closest equivalent to Claude's PostToolUse safety checks.
-
Knowing that Copilot's MCP support is VS Code-only prevents wasted time trying to configure it from the CLI. Use VS Code settings for Copilot MCP, CLI for Claude and Gemini.
-
Every new session starts blank — the agent has no memory of your past decisions, preferences, or architectural choices. Persistent memory closes this gap and compounds over time.
Free cloud tier: 10K memories, 1K retrieval calls/month — no credit card needed. The cloud MCP at mcp.mem0.ai requires an API key. A fully local setup (no API key, no Docker) is also possible using Ollama + local Qdrant or ChromaDB.
# Cloud (easiest — free tier available) npx mcp-add --name mem0-mcp --type http --url "https://mcp.mem0.ai/mcp" claude mcp add mem0-mcp --scope global # Add to .env: MEM0_API_KEY=m0-... # Local (no API key, no Docker) pip install mem0ai ollama chromadb # Then configure with Ollama LLM + ChromaDB vector store
-
Context transforms output. 'Refactor this function' produces a generic refactor. 'Refactor this function because it's being called from three places with inconsistent error handling' produces a targeted, correct one. The word 'because' forces the relevant context into the prompt.
-
Agents jump to implementation by default and bypass the exploratory thinking that surfaces real constraints. Plan Mode enforces a deliberate pause before any file is touched.
# Press Shift+Tab in Claude Code to enter Plan Mode -
Conversational corrections ('no, I meant...') are imprecise and add conversation history the agent must reason through. Inline annotations in the spec file are precise and unambiguous.
-
The assumptions the agent made during a long session are invisible unless you ask. The assumptions section of a summary surfaces things you'd otherwise discover three days later when something breaks.
-
The implementing agent rationalizes its own decisions. A fresh context — even the same model — will find issues the implementing session glossed over. This is not optional for anything going to production.
-
The Pandya power prompt demonstrates that a single paragraph of intent can orchestrate parallel agents, update CLAUDE.md, and create a reusable skill — all from one message. Seeing it run on your own codebase makes the capability concrete.
-
Context exhaustion sneaks up on you. Without visibility, you discover the window is full after the agent has already started a complex task and has to restart. claude-hud makes this visible before it becomes a problem.
/plugin marketplace add jarrodwatts/claude-hud /plugin install claude-hud /claude-hud:setup -
The agent and todo lines are off by default but are the most useful for agentic sessions — they show which subagent is running, what it's doing, and how many tasks remain. The context bar alone is only half the picture.
/claude-hud:configure -
Knowing how much context is pre-consumed before any task starts is essential for planning session length. If 30% is gone before you type, you have a different budget than if 5% is gone.
-
A full context window causes the model to lose track of early instructions. /compact preserves the essential state while freeing up room — like a mid-session memory consolidation.
-
Gemini has a large context window but it's not infinite. Treating it as unlimited leads to context bloat that degrades attention to early instructions. Fresh sessions for new topics is the discipline that prevents this.
-
Without a context bar, the only signal that a session is overloaded is degraded output quality — which you only notice after the damage is done. Proactive session hygiene prevents this.
-
Token costs compound quickly in agentic sessions. RTK can reduce per-session token usage by 60–90% on shell output alone — which directly reduces cost and extends effective context budget.
# macOS brew install rtk # Linux curl -fsSL https://raw.githubusercontent.com/rtk-ai/rtk/refs/heads/master/install.sh | sh -
The hook must be registered for RTK to intercept Claude Code's shell commands. Installation without the hook does nothing.
rtk init -g -
Same as the Claude hook — Gemini CLI uses a different hook registration path. Both must be set up separately.
rtk init -g --gemini -
The Codex hook target covers Copilot CLI sessions. Without this step RTK is installed but not active.
rtk init -g --codex -
The savings number is usually surprising and motivating. Seeing the actual token reduction makes the tradeoff real and encourages keeping the tool active.
rtk gain -
Training data has a cutoff. Last30days gives the agent real-time community intelligence — what practitioners are actually saying, what tools are winning, what's breaking — from the past 30 days.
# Claude Code /plugin marketplace add mvanhorn/last30days-skill /plugin install last30days@last30days-skill # Gemini CLI gemini extensions install https://github.com/mvanhorn/last30days-skill.git -
The Polymarket section is the part you won't get from a search engine: real-money probability estimates on tech outcomes. It's a unique signal that sits alongside community sentiment.
/last30days <your topic> /last30days <your topic> --quick # faster, less depth /last30days topic1 vs topic2 # comparative mode
-
Without a feedback signal, the agent writes broken code, you point it out, it fixes it — three turns per error. A lint hook fires automatically and the agent self-corrects in one turn. This is the difference between an assistant and an autonomous worker.
// In claude/settings.json { "hooks": { "PostToolUse": [ { "command": "pnpm lint 2>&1 | tail -20" } ] } } -
Setting up a hook that silently fails is worse than no hook — you think you have backpressure when you don't. Verification is the only way to confirm the feedback loop is actually closed.
-
A 10-second hook runs on every single tool call. In a 50-call session that's 8 minutes of waiting. Slow hooks don't just feel bad — they change the economics of agentic sessions.
-
You cannot improve what you cannot measure. Rudel makes token usage, session patterns, and model costs visible across sessions — turning a gut feeling about AI productivity into data.
npm install -g rudel rudel login rudel enable -
Gemini has no PostToolUse hook system. Wrapping your linter as an explicit tool is the closest equivalent — the agent calls it deliberately instead of automatically, but the feedback loop is still there.
-
An agent with unrestricted shell access can run rm -rf with no confirmation. excludeTools is a hard block, not a prompt-level request — it prevents the action before it reaches the model.
-
Copilot has no in-session hooks. Git-level pre-commit hooks are the next best thing — they catch errors before they're committed, even if they don't close the loop mid-session.
-
Every correction you give the agent teaches it something — but only for that session. pi-self-learning captures those corrections automatically and builds a corpus of durable preferences across sessions.
npm install -g @pi-labs/cli pi install npm:pi-self-learning -
A month's worth of captured corrections shows you your most common failure modes with this agent. The top 3 almost always belong in CLAUDE.md.
/learning-month -
CORE.md committed to your repo travels to every machine. Without this step, the memory stays on one machine and is gone after a wipe.
-
pi-self-learning gives you distilled lessons. mem0 gives you raw context retrieval — 'what did I decide about the auth architecture last month?' Combining both covers the full memory problem.
Two paths: **Cloud (easiest):** Free tier gives 10K memories and 1K retrieval calls/month. API key required. The official MCP at mcp.mem0.ai is cloud-only. Graph memory (knowledge graph relationships between memories) is Pro-only ($249/mo) in cloud. **Local (no API key, no Docker):** Install mem0ai + Ollama. Uses SQLite for history, local Qdrant or ChromaDB for vectors, and Ollama for LLM/embeddings. Graph memory is free in the local OSS version. Caveat: the default Qdrant path (/tmp/qdrant) is not persistent across reboots — configure a stable path explicitly. The official MCP does not support local; use community alternative github.com/Hroerkr/mem0mcp.
# Cloud path # 1. Sign up at app.mem0.ai, get API key # 2. Add to .env: MEM0_API_KEY=m0-... # 3. Add MCP: npx mcp-add --name mem0-mcp --type http --url "https://mcp.mem0.ai/mcp" # Local path (fully offline, no API keys) pip install mem0ai ollama # Pull Ollama models: ollama pull llama3.1 ollama pull nomic-embed-text # Then configure mem0 with Ollama + local Qdrant path (not /tmp) -
mem0 doesn't save automatically unless instructed. This prompt at the end of a session is the trigger that makes the memory layer useful. Without it, sessions remain isolated.
-
Memory that's stored but never retrieved is worthless. This prompt at the start of a relevant session is what makes past context actually influence the current one.
-
Without an automatic capture hook, memory must be maintained manually. This item is the acknowledgment that the discipline of updating context files after each session is the substitute for automation.
-
Verification confirms the memory layer is actually working — not just installed. If the agent can't answer 'what do I usually prefer for X?' after two weeks, something in the pipeline is broken.
-
Real incidents motivate jai — agents running rm -rf, overwriting uncommitted work, making destructive changes in the wrong directory. A sandbox contains the blast radius before it becomes a real loss.
# Arch (AUR) yay -S jai # From source git clone https://github.com/stanford-scs/jai.git cd jai && ./autogen.sh && ./configure && make && sudo make install jai --init -
A new skill runs arbitrary shell commands. Running it unsandboxed the first time is accepting unknown risk. jai casual mode is a 10-second safety wrapper with no overhead.
jai claude -
Long autonomous tasks accumulate risk — the agent makes many decisions without review. Strict mode means even if something goes wrong deep in the task, your actual home directory is unaffected.
jai --mode strict claude -
Without jai, a git worktree is the next best isolation. A broken implementation in an isolated worktree cannot contaminate main, and the worktree can be deleted without consequence.
-
A €4/month server is the cost of one coffee. The return is an always-on agent that can run multi-hour tasks while you sleep, accessible from any device, with no laptop dependency.
-
A default Ubuntu install with password auth enabled is a credential stuffing target within hours. These three hardening steps eliminate the most common attack vectors before you put API keys on the machine.
-
OpenCode runs on any model provider — Claude, Gemini, local Ollama models. Installing it on the server makes the server model-agnostic, so switching providers doesn't require reprovisioning.
# Install nvm curl -o- https://raw.githubusercontent.com/nvm-sh/nvm/v0.40.1/install.sh | bash nvm install 22 && nvm use 22 # Install opencode npm install -g opencode-ai -
Keys in the environment file are readable only by the dev user. This is more secure than passing them as arguments or hardcoding them in config, and simpler than a secrets manager for a personal server.
-
Without tmux, an SSH disconnect kills every running agent. Persistent sessions mean you can start a 3-hour research task, close your laptop, and reconnect hours later to review the results.
tmux new-session -s main # Detach: Ctrl+B D # Reconnect: tmux attach -t main -
Headless mode with systemd means the agent restarts automatically after a server reboot, without manual SSH intervention. This is what makes the server genuinely always-on rather than on-until-the-next-restart.
[Unit] Description=OpenCode AI Agent After=network.target [Service] Type=simple User=dev WorkingDirectory=/home/dev/projects EnvironmentFile=/home/dev/.env ExecStart=/usr/local/bin/opencode --headless --port 3000 Restart=on-failure [Install] WantedBy=multi-user.target -
Your skills, CLAUDE.md, and MCP config on the server should be identical to your local setup. Cloning ai-kit and running setup.sh on the server makes this a one-command operation.
-
A public-facing SSH port is a credential brute-force target. Tailscale puts the server on a private network — only devices you own can connect, with no firewall rules to maintain.
curl -fsSL https://tailscale.com/install.sh | sh tailscale up -
New machines and server reprovisioning happen. Having setup-remote.sh and update-remote.sh in the repo means future setup is a single command, not a reconstruction from memory.
-
Kicking off a task from your phone proves the setup actually works end-to-end: SSH from a different device, persistent session, agent running independently. Until you do this, you have a server — not an autonomous agent.
-
Single-agent sessions serialize everything. Agent teams break the serialization — independent tasks run in parallel, and the total wall-clock time for a multi-task sprint drops proportionally.
{ "CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS": true } -
Superset handles the orchestration overhead — worktree creation, agent coordination, diff viewing — that makes parallel agents practical rather than just theoretically possible.
-
Prompting for a complex feature produces a complex context that grows until attention degrades. The FD system uses a written spec instead — the agent works from a document, not from conversation history, which scales.
-
Four parallel Opus agents on a hard problem produce a solution that reflects four independent lines of reasoning, then converges. The quality is qualitatively different from a single agent working the same problem serially.
-
The Dispatch skill gives the orchestrator pattern without any additional tooling — the skill itself handles task decomposition, parallel delegation, and result aggregation from within a single session.
-
Different models have measurably different strengths. Routing implementation to the model with the best reasoning and review to the model with the best error-finding produces better combined output than using one model for both.
-
Every agent added beyond 4–6 adds coordination overhead that can exceed the parallelism gain. Knowing the ceiling prevents the mistake of assuming more agents always means faster output.
-
DeerFlow handles the orchestration layer so you don't have to — sub-agent spawning, context scoping, result synthesis. Deploying it once gives you a long-horizon research and execution capability on demand.
git clone https://github.com/bytedance/deer-flow && cd deer-flow make config # generates config.yaml, fill in your model provider docker compose up -
A 30-minute task you'd do manually is the right calibration — concrete enough to evaluate quality, long enough to see the multi-agent orchestration working.
-
Using the output without rewriting it is the signal that DeerFlow is producing real value, not just activity. If you rewrite everything, the bottleneck is still you.
-
Hermes builds a model of your preferences across sessions. The learning loop only has value if it runs for long enough to accumulate meaningful signal — installing it is the prerequisite.
curl -fsSL https://raw.githubusercontent.com/NousResearch/hermes-agent/main/scripts/install.sh | bash source ~/.bashrc hermes setup -
A messaging gateway means you can delegate to Hermes from your phone, get results in Telegram, and never need to open a terminal. The channel is what makes it a daily driver rather than a dev tool.
-
10 sessions is the minimum before the skill accumulation becomes visible. The first few sessions feel like a regular agent — the difference emerges with repeated use.
-
A correctly anticipated unstated preference is the proof that the agent model is working — it extracted a pattern from your behavior that you never explicitly stated.
-
Open SWE is not a tool you install in an afternoon — it's infrastructure. Confirming the prerequisites exist before starting prevents a half-deployed system.
-
The deployment step is where most teams stall. Modal for the sandbox and LangGraph Cloud for orchestration are both services you need accounts for. Plan a day for initial setup.
-
The first real issue is the proof of concept. Seeing the agent work asynchronously — you file an issue, go do other things, come back to a draft PR — demonstrates the async value proposition concretely.
-
A meaningful draft PR requiring minimal rework means the agent understood the requirement, implemented it correctly, and handled edge cases. This is the bar that distinguishes a useful async agent from an expensive autocomplete.
-
Comprehension debt is different from technical debt — it breeds false confidence. You think you understand the system because it works, but when it breaks you have no mental model to debug from.
-
Every token in your context file fights for attention against every other token. The longer it gets, the less any individual rule is weighted. Density beats length every time.
-
The session that implemented the code rationalized every decision it made. A fresh context — same model, new session — will find issues that the implementing session had every incentive to overlook.
-
Zero overrides sounds like AI success. It's often AI capture — the humans stopped checking. What matters is not the rate but the reason. Are you not overriding because everything is correct, or because you stopped looking?
-
Mo Bitar vibe-coded for two years and went back to writing by hand. The long-term costs — inability to debug, loss of system intuition, increasing rework — emerge slowly and are expensive to reverse.