AI Champion Roadmap

A structured learning plan to become more effective with AI coding agents — Claude Code, Gemini CLI, and GitHub Copilot CLI.

Claude Code Gemini CLI Copilot CLI

Level 0

Foundation: AI Resources Repository

One git repo that survives machine changes and holds every config, skill, hook, and learning.

0 / 5

Fork or clone ai-kit as your personal AI tools repository

Every config, skill, hook, and learned preference lives here. Without a canonical repo, improvements stay trapped on one machine — a wipe or new laptop means starting from scratch.
```
git clone https://github.com/urbanisierung/ai-kit ~/github.com/urbanisierung/ai-kit
# Or fork it first on GitHub, then clone your fork
```
ai-kit on GitHub

Copy .env.example to .env and fill in your API keys

The setup script reads keys from `.env` and injects them into MCP configs via envsubst. Without this, none of the tools that require API access will connect.

cp .env.example .env
# Edit .env — fill in:
# ANTHROPIC_API_KEY=sk-ant-...
#
# Search — pick one or more:
# BRAVE_API_KEY=BSA_...    # paid only
# TAVILY_API_KEY=tvly-...  # free tier: 1K/month at tavily.com
# DuckDuckGo needs no API key
#
# XAI_API_KEY=xai-...
# MEM0_API_KEY=m0-...

Customize the context files for your preferred tools

The defaults are generic. Tailoring your context files to your actual workflow is what separates 'agent following generic instructions' from 'agent following your instructions'.

Run bash tools/setup.sh to wire all configs in one shot

Manual symlinking across machines is error-prone and slow. The idempotent setup script means any machine — new laptop, remote server — goes from zero to your full configuration in under a minute.

bash tools/setup.sh

# What it does:
# → Links claude/CLAUDE.md.global  →  ~/.claude/CLAUDE.md
# → Links claude/skills/           →  ~/.claude/skills/
# → Writes ~/.claude/mcp.json from template (keys injected from .env)
# → Links gemini/GEMINI.md.global  →  ~/.gemini/GEMINI.md
# → Sources dotfiles/.zshrc.ai from ~/.zshrc

Push to a private remote repo and verify the setup reproduces from scratch

A repo that only lives locally is a backup risk. The whole point of this setup is that a new machine becomes productive in minutes — verify it now while the steps are fresh.
```
# On a second machine (or after wiping ~/.claude):
git clone [email protected]:yourname/ai-kit.git
cp ai-kit/.env.example ai-kit/.env  # fill in real keys
bash ai-kit/tools/setup.sh
```

Level 1

Context Engineering

Each tool's default behavior matches your preferences without prompting on every session.

0 / 9

Review the existing CLAUDE.md in this repo as the reference format for all context files

Seeing the format before writing your own prevents common mistakes — wrong heading levels, missing frontmatter, rules that contradict each other.
Trim CLAUDE.md to under 100 lines; use @imports for long reference docs Claude

Every line is injected into every session. A 200-line CLAUDE.md consumes ~4,000 tokens before you type a word. Length is actively counterproductive.
Add a subdirectory CLAUDE.md for any subtree with different conventions Claude

A monorepo frontend and backend rarely share the same conventions. Subdirectory files scope rules to where they're actually relevant, keeping context lean everywhere else.
Keep copilot-instructions.md focused — Copilot reads it on every message, no lazy loading Copilot

Copilot doesn't lazy-load rules — the full file is sent on every message. Every sentence that isn't directly useful is dead weight in every single conversation.
Enable AGENTS.md in VS Code: "chat.useAgentsMdFile": true in settings Copilot

Without this setting, Copilot ignores your agent profile files. One flag enables the entire scoped-agent system.
Place GEMINI.md at repo root for project scope; ~/.gemini/GEMINI.md for global defaults Gemini

Without a project-level file, Gemini starts each session with no knowledge of your conventions. The global file prevents you from repeating the same preferences in every chat.
Add a rule across all context files: "Respond concisely; no filler, no preamble"

The single highest-leverage rule. 'No filler' eliminates preamble, summaries, and apologies — the agent gets to the point faster on every single response.
After each correction session, add one rule to your context file

Corrections you type once should never need to be typed again. Each rule added is a permanent improvement to every future session.
When the agent does something wrong and you correct it mid-session, that correction is ephemeral — it only affects the current context window. The moment the session ends, the agent reverts. The fix: immediately after correcting the agent, ask it to write the rule into your context file. Example: you told the agent 'don't wrap every function in a try/catch, only validate at system boundaries.' That's three sessions of repeated corrections or one line in CLAUDE.md.
```
# Example: you just corrected the agent for over-wrapping in try/catch

# 1. Ask the agent to encode the rule:
"Add a rule to CLAUDE.md: never wrap internal function calls in try/catch.
Only validate at system boundaries (user input, external APIs)."

# 2. The agent appends to CLAUDE.md:
## Rules
- Never wrap internal function calls in try/catch.
  Only validate at system boundaries (user input, external APIs).

# 3. Every future session starts with that rule already loaded.
# You will never type that correction again.
```
Prune rules that haven't fired after 10 sessions

Rules that never fire are noise that costs tokens. Pruning keeps the file lean and ensures every remaining rule is earning its place.

Level 2

Compounding Engineering

Each cycle of work improves future cycles. 80% planning and review, 20% execution.

0 / 9

Install compound-engineering Claude

The plugin is what makes the /ce commands available. Without it, /ce:plan is just text with no behavior attached.
```
/plugin marketplace add EveryInc/compound-engineering-plugin
/plugin install compound-engineering
```
EveryInc/compound-engineering-plugin
Install compound-engineering Gemini

Same capability as the Claude install — compound engineering works across tools. Don't skip it because you also use Claude.
```
bunx @every-env/compound-plugin install compound-engineering --to gemini
```
EveryInc/compound-engineering-plugin
Install compound-engineering Copilot

Gives Copilot the same Plan → Delegate → Assess → Codify loop. Cross-tool consistency means you don't have to think differently depending on which agent you're using.
```
bunx @every-env/compound-plugin install compound-engineering --to copilot
```
EveryInc/compound-engineering-plugin
Run /ce:plan on a real backlog task instead of just prompting

Jumping straight to /ce:work without a plan is the most common source of mid-task pivots. Five minutes of planning eliminates hours of correction.
Run /ce:work to execute with worktree isolation and task tracking

Worktree isolation means a broken implementation doesn't contaminate your main branch. Task tracking means you can hand off mid-session without losing context.
Run /ce:review before merging any agent-produced code

The agent that implemented the code will rationalize its own decisions. A review pass from a fresh context catches what the implementing session missed.
Run /ce:compound after the task — review the extracted learnings

This is what separates compound engineering from regular AI use. Without /ce:compound, each task is isolated. With it, the lesson from today becomes a rule that prevents the same mistake tomorrow.
Commit the learnings output to memory/ in your ai-kit repo

Learnings that stay only in your local session are lost the next time you open a new chat. Committed to the repo, they travel to every machine and persist forever.
Repeat the Plan → Delegate → Assess → Codify loop on 3 real tasks

The loop compounds in value — the third task benefits from two rounds of learning. Running it once gives you a taste; running it three times shows you the actual leverage.

Level 3

Skills and Extensions

The agent gains specialized capabilities without permanently consuming context.

0 / 12

Write your first custom SKILL.md — write the description frontmatter first (it's the routing key)

A skill you write is a procedure you never have to explain again. The description field is critical — Claude uses it to decide whether to load the skill, so precision there is routing logic, not documentation.
Store the skill in your ai-kit repo under the relevant tool's skills directory

A skill that only exists in one project is inaccessible from all other projects. The ai-kit repo is what makes a skill globally available.
Add AGENTS.md to the repo root if not done in Level 0 — all tools pick it up

All four major tools recognize AGENTS.md. Writing one file gives every agent the same baseline context without duplicating effort across tool-specific files.
Install skill-creator from skills.sh (meta-skill for writing skills) Claude

Writing a good SKILL.md is non-obvious — the description routing, the trigger conditions, the success criteria all have specific patterns. skill-creator teaches the format by example.
```
/plugin install skill-creator
```
anthropics/skills — skill-creator
Install Superpowers and run on a real task Claude

The single fastest way to see what Level 5 agent behavior looks like in practice. Watching the brainstorm → plan → implement → test loop run on a real task is more instructive than reading about it.
```
/plugin install superpowers@claude-plugins-official
```
obra/superpowers
Browse skills.sh and install 2–3 skills that match your workflow Claude

Skills already written by practitioners who solved the same problems you have. Finding two or three relevant ones on day one is faster than writing them from scratch.
Run harness audit from Everything Claude Code Claude optional

Without a baseline score, you don't know which capability gaps matter most. /harness-audit tells you where you are on the maturity curve before you start adding more.
```
/harness-audit
```
affaan-m/everything-claude-code ecc.tools
Install Superpowers for Gemini CLI via the Gemini extension mechanism Gemini

The same planning and execution discipline from Superpowers, applied to Gemini CLI sessions. Same methodology, different runtime.
Browse github/awesome-copilot for community skills and agents Copilot

175+ agents and 208+ skills the Copilot community has already built. Browsing it before writing your own prevents reinventing what already exists.
Create .github/agents/*.md files for scoped Copilot agents (VS Code agent picker) Copilot

Agent profiles let you create named specialists — a 'code reviewer', a 'test writer', a 'PR describer' — each with their own scoped system prompt. Without them, every Copilot conversation starts from the same generic baseline.
Create an extension in ~/.gemini/extensions//gemini-extension.json to bundle MCP servers + instructions Gemini

Extensions are how Gemini gets persistent, reusable behavior. Without them, every Gemini session is stateless and context-free beyond what GEMINI.md provides.
Study Sentry's skills repo for the subagent delegation pattern — applies to all tools

Sentry's PR review skill is the clearest real-world example of the orchestrator → parallel subagents → aggregation pattern. Reading the implementation is faster than discovering the pattern yourself.

Level 4

MCP Servers

The agent can search the web and reach external systems. MCP context cost is managed.

0 / 10

Add a web search MCP

Without search, the agent works from training data alone — no awareness of libraries released after its cutoff, no ability to look up current documentation, no real-time context. Search is the single biggest capability upgrade per minute of setup time.
Three options, in order of cost: (1) DuckDuckGo — zero setup, no API key, completely free. (2) Tavily — register for a free API key, 1,000 searches/month free, AI-optimised results. (3) Brave Search — no free tier as of 2026, paid plans only. The `mcp.json.template` ships with all three — remove whichever you don't need from your personal repo's copy.
```
# Option 1 — DuckDuckGo (no API key)
claude mcp add duckduckgo -- npx -y duckduckgo-mcp-server

# Option 2 — Tavily (free tier: 1K searches/month)
# Add TAVILY_API_KEY to .env, then:
claude mcp add tavily --transport http --url "https://mcp.tavily.com/mcp/?tavilyApiKey=YOUR_KEY"

# Option 3 — Brave Search (paid only)
claude mcp add brave-search -e BRAVE_API_KEY=BSA_YOUR_KEY -- npx -y @modelcontextprotocol/server-brave-search

# Gemini CLI — add chosen option to ~/.gemini/settings.json mcpServers block
```
Tavily — get free API key Brave Search API (paid)
Add DeepWiki MCP — no API key required

Pulling a library's docs into context manually is slow and consumes tokens. DeepWiki provides structured access to any open-source repo's documentation on demand, with no manual copying.
```
{ "mcpServers": { "deepwiki": { "command": "npx", "args": ["-y", "@deepwiki/mcp"] } } }
```
DeepWiki MCP
Add Grok MCP for X/Twitter search (sign up at console.x.ai)

For anything where X/Twitter signal matters — trending libraries, community sentiment, real-time debates — Grok has unique access. The free tier ($25/month beta) ended Dec 2024; it is now a paid API. If X/Twitter search is not needed, Groq (console.groq.com) is a fully free alternative with fast inference and no credit card required.
No ongoing free tier as of Dec 2024. New accounts get a one-time $25 credit (30-day expiry). A data-sharing program offers $150/month credits but requires $5 prior spend, is region-restricted (no EU/UK/Iceland/Liechtenstein/Norway), and opt-in is permanent.
```
# Get the MCP from github.com/merterbak/Grok-MCP
# Add XAI_API_KEY to your .env
```
xAI Console (sign up)merterbak/Grok-MCP
Store all keys in ai-kit/.env; update mcp.json.template with ${VAR_NAME} placeholders

Keys hardcoded in config files get committed to git. The template + envsubst pattern keeps keys out of version control while making configs reproducible across machines.
Test: ask the agent to research something — it should search without you pasting URLs

Verifying search works before relying on it in a real task prevents the frustrating experience of discovering a broken MCP mid-session.
Run /context in a fresh session — investigate if > 15% is pre-consumed before typing Claude

Most practitioners are surprised the first time they run /context — 30–40% consumed before typing a word is common. You cannot optimize what you cannot see.
Habit: disable MCPs not needed in the current session Claude

Every mounted MCP injects its full tool schemas on every turn whether Claude uses them or not. Three idle MCPs can consume 10–15% of your context window before any code appears.
Use "excludeTools" in extension JSON to block dangerous tool combos per context Gemini

Gemini's extension system cannot run arbitrary hooks, but it can block dangerous operations. Excluding destructive commands is the closest equivalent to Claude's PostToolUse safety checks.
MCP in Copilot is VS Code-only — manage via workspace settings, no CLI-native method Copilot

Knowing that Copilot's MCP support is VS Code-only prevents wasted time trying to configure it from the CLI. Use VS Code settings for Copilot MCP, CLI for Claude and Gemini.
Add mem0 MCP for cross-session memory optional

Every new session starts blank — the agent has no memory of your past decisions, preferences, or architectural choices. Persistent memory closes this gap and compounds over time.
Free cloud tier: 10K memories, 1K retrieval calls/month — no credit card needed. The cloud MCP at mcp.mem0.ai requires an API key. A fully local setup (no API key, no Docker) is also possible using Ollama + local Qdrant or ChromaDB.
```
# Cloud (easiest — free tier available)
npx mcp-add --name mem0-mcp --type http --url "https://mcp.mem0.ai/mcp"
claude mcp add mem0-mcp --scope global

# Add to .env:
MEM0_API_KEY=m0-...

# Local (no API key, no Docker)
pip install mem0ai ollama chromadb
# Then configure with Ollama LLM + ChromaDB vector store
```
Get API key (free tier)mem0 MCP docs Local setup with Ollama

Level 5

Prompting Discipline

Better first responses and fewer correction loops.

0 / 6

Before every non-trivial task: write one sentence starting with "because"

Context transforms output. 'Refactor this function' produces a generic refactor. 'Refactor this function because it's being called from three places with inconsistent error handling' produces a targeted, correct one. The word 'because' forces the relevant context into the prompt.
Use Plan Mode to iterate before any code is written Claude

Agents jump to implementation by default and bypass the exploratory thinking that surfaces real constraints. Plan Mode enforces a deliberate pause before any file is touched.
```
# Press Shift+Tab in Claude Code to enter Plan Mode
```
Try the inline annotation pattern: edit spec file directly, add %% notes, tell the agent check %% notes

Conversational corrections ('no, I meant...') are imprecise and add conversation history the agent must reason through. Inline annotations in the spec file are precise and unambiguous.
At the end of long sessions: ask "Summarize what changed, assumptions made, and what you'd do differently"

The assumptions the agent made during a long session are invisible unless you ask. The assumptions section of a summary surfaces things you'd otherwise discover three days later when something breaks.
For any review pass: use a separate session or model — don't let the same instance grade its own exam

The implementing agent rationalizes its own decisions. A fresh context — even the same model — will find issues the implementing session glossed over. This is not optional for anything going to production.
Run a Pandya-style power prompt on something real in this repo and observe the parallelism

The Pandya power prompt demonstrates that a single paragraph of intent can orchestrate parallel agents, update CLAUDE.md, and create a reusable skill — all from one message. Seeing it run on your own codebase makes the capability concrete.

Level 6

Token Management and Observability

Know what the context window contains. Stop burning tokens on noise.

0 / 14

Install claude-hud for live context bar and subagent status Claude

Context exhaustion sneaks up on you. Without visibility, you discover the window is full after the agent has already started a complex task and has to restart. claude-hud makes this visible before it becomes a problem.
```
/plugin marketplace add jarrodwatts/claude-hud
/plugin install claude-hud
/claude-hud:setup
```
jarrodwatts/claude-hud
Configure claude-hud to show agent and todo lines Claude

The agent and todo lines are off by default but are the most useful for agentic sessions — they show which subagent is running, what it's doing, and how many tasks remain. The context bar alone is only half the picture.
```
/claude-hud:configure
```
jarrodwatts/claude-hud
Run /context before any heavy session; investigate if > 15% is pre-consumed Claude

Knowing how much context is pre-consumed before any task starts is essential for planning session length. If 30% is gone before you type, you have a different budget than if 5% is gone.
Habit: use /compact or start a fresh session when the context bar turns yellow Claude

A full context window causes the model to lose track of early instructions. /compact preserves the essential state while freeing up room — like a mid-session memory consolidation.
Watch for context warnings in Gemini CLI output; start fresh sessions when switching topics Gemini

Gemini has a large context window but it's not infinite. Treating it as unlimited leads to context bloat that degrades attention to early instructions. Fresh sessions for new topics is the discipline that prevents this.
No built-in context visibility — keep sessions short and topic-scoped as a discipline Copilot

Without a context bar, the only signal that a session is overloaded is degraded output quality — which you only notice after the damage is done. Proactive session hygiene prevents this.
Install RTK (Rust Token Killer) — 60–90% token savings on shell output

Token costs compound quickly in agentic sessions. RTK can reduce per-session token usage by 60–90% on shell output alone — which directly reduces cost and extends effective context budget.
```
# macOS
brew install rtk

# Linux
curl -fsSL https://raw.githubusercontent.com/rtk-ai/rtk/refs/heads/master/install.sh | sh
```
rtk-ai/rtk
Activate RTK hook for Claude Code Claude

The hook must be registered for RTK to intercept Claude Code's shell commands. Installation without the hook does nothing.
```
rtk init -g
```
rtk-ai/rtk
Activate RTK hook for Gemini CLI Gemini

Same as the Claude hook — Gemini CLI uses a different hook registration path. Both must be set up separately.
```
rtk init -g --gemini
```
rtk-ai/rtk
Activate RTK hook for Copilot / Codex Copilot

The Codex hook target covers Copilot CLI sessions. Without this step RTK is installed but not active.
```
rtk init -g --codex
```
rtk-ai/rtk
Check RTK token savings after your next session

The savings number is usually surprising and motivating. Seeing the actual token reduction makes the tradeoff real and encourages keeping the tool active.
```
rtk gain
```
rtk-ai/rtk
Install Caveman — 65–75% output token reduction via response compression Claude

RTK compresses shell output; Caveman compresses the agent's own responses. Multiple intensity levels (Lite, Full, Ultra) let you tune the tradeoff. Research-backed: brevity can improve accuracy by 26 percentage points while cutting output tokens 65–75%.
```
/plugin install caveman@JuliusBrussee/caveman

# Specialized skills included:
# /caveman-commit   — terse conventional commit messages (≤50 chars)
# /caveman-review   — one-line PR comments with emoji indicators
# /caveman-compress — rewrites CLAUDE.md to save ~46% input tokens per session
```
JuliusBrussee/caveman
Install last30days skill Claude+Gemini

Training data has a cutoff. Last30days gives the agent real-time community intelligence — what practitioners are actually saying, what tools are winning, what's breaking — from the past 30 days.
```
# Claude Code
/plugin marketplace add mvanhorn/last30days-skill
/plugin install last30days@last30days-skill

# Gemini CLI
gemini extensions install https://github.com/mvanhorn/last30days-skill.git
```
mvanhorn/last30days-skill
Run last30days on a current topic and review the Polymarket section Claude+Gemini

The Polymarket section is the part you won't get from a search engine: real-money probability estimates on tech outcomes. It's a unique signal that sits alongside community sentiment.
```
/last30days <your topic>
/last30days <your topic> --quick   # faster, less depth
/last30days topic1 vs topic2       # comparative mode
```
mvanhorn/last30days-skill

Level 7

Hooks and Persistent Memory

The agent detects its own errors without prompting. Sessions start with context from previous ones.

0 / 20

Backpressure / Hooks

Add a PostToolUse hook that runs your linter after every file write Claude

Without a feedback signal, the agent writes broken code, you point it out, it fixes it — three turns per error. A lint hook fires automatically and the agent self-corrects in one turn. This is the difference between an assistant and an autonomous worker.
```
// In claude/settings.json
{
  "hooks": {
    "PostToolUse": [
      { "command": "pnpm lint 2>&1 | tail -20" }
    ]
  }
}
```
Claude Code hooks docs
Verify the hook fires and Claude corrects lint errors without prompting Claude

Setting up a hook that silently fails is worse than no hook — you think you have backpressure when you don't. Verification is the only way to confirm the feedback loop is actually closed.
Keep hooks fast — slow hooks degrade the feedback loop (target < 2s) Claude

A 10-second hook runs on every single tool call. In a 50-call session that's 8 minutes of waiting. Slow hooks don't just feel bad — they change the economics of agentic sessions.
Install Rudel for session analytics Claude optional

You cannot improve what you cannot measure. Rudel makes token usage, session patterns, and model costs visible across sessions — turning a gut feeling about AI productivity into data.
```
npm install -g rudel
rudel login
rudel enable
```
obsessiondb/rudel
Create an extension that bundles your linter as an MCP tool so Gemini can call it explicitly Gemini

Gemini has no PostToolUse hook system. Wrapping your linter as an explicit tool is the closest equivalent — the agent calls it deliberately instead of automatically, but the feedback loop is still there.
Use "excludeTools" to block destructive operations (e.g., run_shell_command(rm -rf)) Gemini

An agent with unrestricted shell access can run rm -rf with no confirmation. excludeTools is a hard block, not a prompt-level request — it prevents the action before it reaches the model.
Set up git-level pre-commit hooks (lefthook or husky) — Copilot has no in-session hook system Copilot

Copilot has no in-session hooks. Git-level pre-commit hooks are the next best thing — they catch errors before they're committed, even if they don't close the loop mid-session.

Persistent Memory

Install pi-self-learning for automatic session memory capture Claude

Every correction you give the agent teaches it something — but only for that session. pi-self-learning captures those corrections automatically and builds a corpus of durable preferences across sessions.
```
npm install -g @pi-labs/cli
pi install npm:pi-self-learning
```
mcollina/pi-self-learning
Review a month of captured learnings Claude

A month's worth of captured corrections shows you your most common failure modes with this agent. The top 3 almost always belong in CLAUDE.md.
```
/learning-month
```
mcollina/pi-self-learning
Commit the resulting CORE.md to ai-kit/memory/ Claude

CORE.md committed to your repo travels to every machine. Without this step, the memory stays on one machine and is gone after a wipe.
Set up mem0 for cross-session memory

pi-self-learning gives you distilled lessons. mem0 gives you raw context retrieval — 'what did I decide about the auth architecture last month?' Combining both covers the full memory problem.
Two paths: **Cloud (easiest):** Free tier gives 10K memories and 1K retrieval calls/month. API key required. The official MCP at mcp.mem0.ai is cloud-only. Graph memory (knowledge graph relationships between memories) is Pro-only ($249/mo) in cloud. **Local (no API key, no Docker):** Install mem0ai + Ollama. Uses SQLite for history, local Qdrant or ChromaDB for vectors, and Ollama for LLM/embeddings. Graph memory is free in the local OSS version. Caveat: the default Qdrant path (/tmp/qdrant) is not persistent across reboots — configure a stable path explicitly. The official MCP does not support local; use community alternative github.com/Hroerkr/mem0mcp.
```
# Cloud path
# 1. Sign up at app.mem0.ai, get API key
# 2. Add to .env: MEM0_API_KEY=m0-...
# 3. Add MCP:
npx mcp-add --name mem0-mcp --type http --url "https://mcp.mem0.ai/mcp"

# Local path (fully offline, no API keys)
pip install mem0ai ollama
# Pull Ollama models:
ollama pull llama3.1
ollama pull nomic-embed-text
# Then configure mem0 with Ollama + local Qdrant path (not /tmp)
```
app.mem0.ai (sign up)Self-hosting guide Local + Ollama cookbook Platform vs OSS comparison Community local MCP
Habit: at session end, tell the agent "Save key decisions from this session to memory"

mem0 doesn't save automatically unless instructed. This prompt at the end of a session is the trigger that makes the memory layer useful. Without it, sessions remain isolated.
Habit: at session start on a returning topic, tell the agent "Check memory for context on [topic]"

Memory that's stored but never retrieved is worthless. This prompt at the start of a relevant session is what makes past context actually influence the current one.
Install Hippo Memory — decay-aware, cross-tool persistent memory

mem0 and pi-self-learning don't model forgetting — every memory persists equally. Hippo's decay mechanics mean the memory layer stays lean over time: stale knowledge fades, frequently-retrieved context strengthens, and error lessons are weighted higher. Useful when a growing memory corpus starts degrading recall precision.
Hippo implements memory decay (default 7-day half-life), retrieval strengthening (each recall extends lifespan by 2 days), and error prioritization (error-tagged memories persist 2x longer). Hybrid search combines BM25 keyword matching with optional semantic embeddings. Auto-detects your agent framework and patches CLAUDE.md, .cursorrules, or AGENTS.md on init. Distinct from mem0: Hippo is a local CLI with SQLite + markdown mirrors and no cloud dependency; mem0 is MCP-based with a managed retrieval API.
```
npm install -g hippo-memory
hippo init                              # single project
hippo init --scan ~                     # all projects at once

# Core workflow:
hippo remember "lesson or decision" --tag error  # errors persist 2x longer
hippo recall "topic" --budget 2000               # retrieve into context
hippo sleep                                      # nightly consolidation
```
kitfunso/hippo-memory
Without pi-self-learning, mem0 + manually maintained context files are the primary memory mechanism Gemini+Copilot

Without an automatic capture hook, memory must be maintained manually. This item is the acknowledgment that the discipline of updating context files after each session is the substitute for automation.
After two weeks: ask the agent "what do I usually prefer for X?" — it should answer from memory

Verification confirms the memory layer is actually working — not just installed. If the agent can't answer 'what do I usually prefer for X?' after two weeks, something in the pipeline is broken.

Sandbox Safety

Install jai — lightweight copy-on-write sandbox for agents (Linux) Claude

Real incidents motivate jai — agents running rm -rf, overwriting uncommitted work, making destructive changes in the wrong directory. A sandbox contains the blast radius before it becomes a real loss.
```
# Arch (AUR)
yay -S jai

# From source
git clone https://github.com/stanford-scs/jai.git
cd jai && ./autogen.sh && ./configure && make && sudo make install
jai --init
```
jai.scs.stanford.edu stanford-scs/jai
Run Claude inside the jai sandbox when testing new skills or hooks Claude

A new skill runs arbitrary shell commands. Running it unsandboxed the first time is accepting unknown risk. jai casual mode is a 10-second safety wrapper with no overhead.
```
jai claude
```
jai.scs.stanford.edu stanford-scs/jai
Use strict jai mode before delegating long autonomous tasks Claude

Long autonomous tasks accumulate risk — the agent makes many decisions without review. Strict mode means even if something goes wrong deep in the task, your actual home directory is unaffected.
```
jai --mode strict claude
```
jai.scs.stanford.edu stanford-scs/jai
Use git worktrees as a manual blast-radius limiter when no dedicated sandbox exists Gemini+Copilot

Without jai, a git worktree is the next best isolation. A broken implementation in an isolated worktree cannot contaminate main, and the worktree can be deleted without consequence.

Level 8

Remote Setup

An always-on agent server reachable from anywhere. No laptop dependency.

0 / 10

Provision a Hetzner CX22 (~€4/month) with Ubuntu 24.04

A €4/month server is the cost of one coffee. The return is an always-on agent that can run multi-hour tasks while you sleep, accessible from any device, with no laptop dependency.
Harden the server: create dev user, copy SSH key, disable password auth

A default Ubuntu install with password auth enabled is a credential stuffing target within hours. These three hardening steps eliminate the most common attack vectors before you put API keys on the machine.
Install Node.js 22 and OpenCode on the server

OpenCode runs on any model provider — Claude, Gemini, local Ollama models. Installing it on the server makes the server model-agnostic, so switching providers doesn't require reprovisioning.
```
# Install nvm
curl -o- https://raw.githubusercontent.com/nvm-sh/nvm/v0.40.1/install.sh | bash
nvm install 22 && nvm use 22

# Install opencode
npm install -g opencode-ai
```
sst/opencode
Create /home/dev/.env with API keys for Claude, Gemini, and any others

Keys in the environment file are readable only by the dev user. This is more secure than passing them as arguments or hardcoding them in config, and simpler than a secrets manager for a personal server.
Start a persistent tmux session

Without tmux, an SSH disconnect kills every running agent. Persistent sessions mean you can start a 3-hour research task, close your laptop, and reconnect hours later to review the results.
```
tmux new-session -s main
# Detach: Ctrl+B D
# Reconnect: tmux attach -t main
```

Set up a systemd service for headless OpenCode

Headless mode with systemd means the agent restarts automatically after a server reboot, without manual SSH intervention. This is what makes the server genuinely always-on rather than on-until-the-next-restart.

[Unit]
Description=OpenCode AI Agent
After=network.target

[Service]
Type=simple
User=dev
WorkingDirectory=/home/dev/projects
EnvironmentFile=/home/dev/.env
ExecStart=/usr/local/bin/opencode --headless --port 3000
Restart=on-failure

[Install]
WantedBy=multi-user.target

Clone your ai-kit repo and run bash tools/setup.sh on the server

Your skills, CLAUDE.md, and MCP config on the server should be identical to your local setup. Cloning ai-kit and running setup.sh on the server makes this a one-command operation.
Install Tailscale for private SSH access from any device

A public-facing SSH port is a credential brute-force target. Tailscale puts the server on a private network — only devices you own can connect, with no firewall rules to maintain.
```
curl -fsSL https://tailscale.com/install.sh | sh
tailscale up
```
Write tools/setup-remote.sh and tools/update-remote.sh in ai-kit

New machines and server reprovisioning happen. Having setup-remote.sh and update-remote.sh in the repo means future setup is a single command, not a reconstruction from memory.
Milestone: kick off a long background task from your phone via SSH

Kicking off a task from your phone proves the setup actually works end-to-end: SSH from a different device, persistent session, agent running independently. Until you do this, you have a server — not an autonomous agent.

Level 9

Parallel Agents

Multiple agents working on independent tasks simultaneously.

0 / 18

Enable experimental agent teams in Claude Code settings Claude

Single-agent sessions serialize everything. Agent teams break the serialization — independent tasks run in parallel, and the total wall-clock time for a multi-task sprint drops proportionally.
```
{ "CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS": true }
```
Claude Code agent teams docs
Download Superset and run 2 agents on genuinely independent tasks Claude

Superset handles the orchestration overhead — worktree creation, agent coordination, diff viewing — that makes parallel agents practical rather than just theoretically possible.
Run /fd-init, write one real FD spec for your next task instead of prompting Claude

Prompting for a complex feature produces a complex context that grows until attention degrades. The FD system uses a written spec instead — the agent works from a document, not from conversation history, which scales.
Run /fd-deep (4 parallel Opus agents) on a complex task Claude

Four parallel Opus agents on a hard problem produce a solution that reflects four independent lines of reasoning, then converges. The quality is qualitatively different from a single agent working the same problem serially.
Install the Dispatch skill for orchestrator-pattern without extra tooling Claude

The Dispatch skill gives the orchestrator pattern without any additional tooling — the skill itself handles task decomposition, parallel delegation, and result aggregation from within a single session.
Practice multi-model dispatch: one model for implementation, a different session/model for review

Different models have measurably different strengths. Routing implementation to the model with the best reasoning and review to the model with the best error-finding produces better combined output than using one model for both.
Observe the practical ceiling: coordination overhead exceeds parallelism gain above 4–6 agents

Every agent added beyond 4–6 adds coordination overhead that can exceed the parallelism gain. Knowing the ceiling prevents the mistake of assuming more agents always means faster output.

Path A — Long-horizon tasks (DeerFlow)

Deploy DeerFlow via Docker

DeerFlow handles the orchestration layer so you don't have to — sub-agent spawning, context scoping, result synthesis. Deploying it once gives you a long-horizon research and execution capability on demand.
```
git clone https://github.com/bytedance/deer-flow && cd deer-flow
make config    # generates config.yaml, fill in your model provider
docker compose up
```
bytedance/deer-flow
Delegate a research task that would take 30 minutes manually

A 30-minute task you'd do manually is the right calibration — concrete enough to evaluate quality, long enough to see the multi-agent orchestration working.
Milestone: use the DeerFlow output directly without rewriting it

Using the output without rewriting it is the signal that DeerFlow is producing real value, not just activity. If you rewrite everything, the bottleneck is still you.

Path B — Persistent personal agent (Hermes)

Install Hermes Agent

Hermes builds a model of your preferences across sessions. The learning loop only has value if it runs for long enough to accumulate meaningful signal — installing it is the prerequisite.
```
curl -fsSL https://raw.githubusercontent.com/NousResearch/hermes-agent/main/scripts/install.sh | bash
source ~/.bashrc
hermes setup
```
NousResearch/hermes-agent
Connect to one messaging platform (Telegram is simplest)

A messaging gateway means you can delegate to Hermes from your phone, get results in Telegram, and never need to open a terminal. The channel is what makes it a daily driver rather than a dev tool.
Run for 10+ sessions; check the skill stocktake

10 sessions is the minimum before the skill accumulation becomes visible. The first few sessions feel like a regular agent — the difference emerges with repeated use.
Milestone: Hermes correctly anticipates a preference you never explicitly stated

A correctly anticipated unstated preference is the proof that the agent model is working — it extracted a pattern from your behavior that you never explicitly stated.

Path C — Team-scale async agent (Open SWE)

Prerequisites: shared GitHub repo, Slack, Linear

Open SWE is not a tool you install in an afternoon — it's infrastructure. Confirming the prerequisites exist before starting prevents a half-deployed system.
Deploy Open SWE with LangGraph Cloud and a sandbox provider (Modal)

The deployment step is where most teams stall. Modal for the sandbox and LangGraph Cloud for orchestration are both services you need accounts for. Plan a day for initial setup.
File a Linear issue, comment @openswe

The first real issue is the proof of concept. Seeing the agent work asynchronously — you file an issue, go do other things, come back to a draft PR — demonstrates the async value proposition concretely.
Milestone: agent opens a meaningful draft PR requiring minimal rework

A meaningful draft PR requiring minimal rework means the agent understood the requirement, implemented it correctly, and handled edge cases. This is the bar that distinguishes a useful async agent from an expensive autocomplete.

Reference

Anti-Patterns Checklist

Review periodically. These are the failure modes.

0 / 5

Am I shipping AI-generated code I don't understand? (comprehension debt)

Comprehension debt is different from technical debt — it breeds false confidence. You think you understand the system because it works, but when it breaks you have no mental model to debug from.
Is my context file over 100 lines? Is my MCP list longer than sessions need?

Every token in your context file fights for attention against every other token. The longer it gets, the less any individual rule is weighted. Density beats length every time.
Am I using the same session to implement and review?

The session that implemented the code rationalized every decision it made. A fresh context — same model, new session — will find issues that the implementing session had every incentive to overlook.
Is my human override rate trending toward zero without tracking why?

Zero overrides sounds like AI success. It's often AI capture — the humans stopped checking. What matters is not the rate but the reason. Are you not overriding because everything is correct, or because you stopped looking?
Am I vibe-coding without maintaining a mental model of the system?

Mo Bitar vibe-coded for two years and went back to writing by hand. The long-term costs — inability to debug, loss of system intuition, increasing rework — emerge slowly and are expensive to reverse.