← csonnier.com GitHub →

Convergence

A consolidated toolkit of agents, skills, and workflows for Claude Code — distilled from 7 open-source repos and validated against findings from the Coding Agents Summit 2026.

15
Focused Tools
10
Skills
4
Agents
5
Workflows
280+
Patterns Analyzed
7
Source Repos
$ claude plugin install github:c-sonnier/convergence
click to copy

Design Principles

Every item in this toolkit was designed against these constraints. They come from the intersection of what worked across repos and what the summit validated.

Instruction Budget

Every skill stays under 35 instructions. LLMs reliably follow ~150-200 total; at 35 per skill you can stack 4-5 active skills without degradation.

Summit: Dex Horthy | Validated by: gstack preamble bloat issues

Human In The Loop

Workflows force the agent to surface assumptions for human correction before writing code. No automated decision pipelines. The engineer makes design choices.

Summit: "Do not outsource the thinking" | Validated by: superpowers brainstorming gates

Control Flow, Not Prompts

Multi-phase workflows use actual routing (classify input, call focused skill) not monolithic prompts with 85 instructions hoping the model follows all phases.

Summit: Dex on CRISPY split | 12 Factor Agents paper

📄 Static Artifacts

Every workflow phase writes output to a file. This survives compaction, enables session resumption, and allows human review without context window dependency.

Summit: Dex on static assets | superpowers spec files | gstack session persistence

🔎 Verify The Code

Review actual code, not plans. Plans have surprises; code is truth. Verification means running commands and reading output, not claiming success.

Summit: "Please read the code" | superpowers verification-before-completion | agency-agents Reality Checker

🔒 Security By Default

Three layers: scoped credentials before starting, audit logging during, vulnerability scanning after. Agents inherit your access — treat them like an intern on day one.

Summit: Milan Williams (Semgrep) | gstack safety hooks | counselors env security

📚 Codebase First

Always scan existing patterns before proposing changes. Research is objective compression of truth, never opinion. Separate the "what are we building" from "what exists."

Summit: Dex on research contamination | rails-conventions codebase-first | palkan analyze

Vertical Slicing

Plans and implementations follow vertical slices (end-to-end with checkpoints) not horizontal layers (all DB, then all API, then all frontend). Testable at each phase.

Summit: Dex on vertical vs horizontal | Validated by: "models cannot stop writing horizontal plans"

Convergence Analysis

Patterns that appeared in 3+ repos AND were validated by summit findings. These are the highest-confidence patterns — independently discovered by multiple teams and confirmed by real-world data from thousands of engineers.

Structured Process Before Code

5 repos + 4 summit speakers

Every high-performing repo gates on some form of planning before implementation. Summit refined this: the structure should be lightweight alignment (design + outline), not heavyweight plans. But the principle — think before you build — is universal.

superpowers-ruby gstack agency-agents counselors palkan Dex (CRISPY) Mihail (RePPIT)

Evidence-Based Verification

4 repos + 3 summit speakers

Never claim work is done without running verification. "Confidence is not evidence." This pattern independently emerged as superpowers' Iron Law, agency-agents' Reality Checker, gstack's QA health scores, and Dex's "read the code."

superpowers-ruby gstack agency-agents counselors Dex Databricks Semgrep

Code Review Over Plan Review

5 repos + 2 summit speakers

5 of 6 repos built dedicated review capabilities. Summit validated this but redirected the target: review code, not plans. Plans are approximations; code is what ships.

superpowers-ruby gstack agency-agents counselors palkan Dex Databricks

Systematic Debugging

3 repos + summit validated

Root cause first, always. superpowers' 4-phase investigation, gstack's /investigate, and agency-agents' Workflow Architect all enforce "understand before fixing." Summit's CRISPY separates research from implementation to prevent premature solutions.

superpowers-ruby gstack agency-agents Dex (research separation)

Test-Driven Quality

4 repos + 2 summit speakers

Tests aren't optional. superpowers enforces strict TDD, gstack's QA generates regression tests, palkan uses specification tests as design tools, rails-conventions mandates minitest. Summit: "verification is king with LLMs."

superpowers-ruby gstack palkan rails-conventions Databricks Dex

Security Scanning

4 repos + 2 summit speakers

Multiple approaches, universal concern. Summit elevated this from "nice to have" to "three-layer framework": scoped credentials, audit logging, and code scanning.

superpowers-ruby gstack counselors rails-conventions Semgrep Databricks

Codebase-First Research

3 repos + summit validated

Scan existing patterns before proposing changes. rails-conventions mandates inspection, palkan analyzes architecture layers, superpowers researches codebase. Summit adds: research context must be objective — hide the ticket to prevent opinion contamination.

superpowers-ruby palkan rails-conventions Dex (ticket-blind research)

Architecture Quality Gates

3 repos + summit compatible

Quantified thresholds for code quality. palkan's callback scoring (1-5), rails-conventions' size limits (class <200 lines), superpowers' Sandi Metz rules (class <100 LOC). Different numbers, same idea: measurable, enforceable quality boundaries.

superpowers-ruby palkan rails-conventions

The Consolidated Workflow

Five phases, each a separate focused skill with its own context window. Inspired by Dex's CRISPY evolution but simplified and incorporating the strongest patterns from all repos. Each phase writes a static artifact — you can resume from any point.

Phase 1
Research
Objective codebase exploration. Ticket-blind.
<30 instructions
Phase 2
Design
Alignment discussion. ~200 lines. Human decides.
<25 instructions
Phase 3
Outline
Vertical structure. Phases + signatures. ~2 pages.
<20 instructions
Phase 4
Implement
Work through outline. Verify each slice.
<30 instructions
Phase 5
Review
Read the actual code. Verify. Ship.
<35 instructions
Summit principle: Each phase runs in a fresh or minimal context window. The only shared state between phases is the static artifact files. This prevents context bloat, enables session resumption after compaction, and keeps instruction count low. "Don't use prompts for control flow if you can use control flow for control flow." — Dex Horthy

Skills

9 focused skills. Each is self-contained, under 35 instructions, and sourced from patterns that appeared in 3+ repos. Organized into workflow skills (sequential phases) and utility skills (standalone, invoke when needed).

Workflow Skills

🔎

/convergence-research

Objective codebase exploration — ticket-blind, fact-only

Workflow
superpowers-ruby rails-conventions palkan Summit: Dex

Explores the codebase to produce a compressed, objective research document. The critical innovation from the summit: the research context never sees the ticket or feature description. A separate context generates questions from the ticket; this skill executes those questions against the codebase and records only facts.

Core Instructions 18 instructions

  1. Accept a list of questions (never the ticket/feature description itself)
  2. For each question, launch a sub-agent that traces a vertical slice through the codebase
  3. Record only facts: file paths, function signatures, data flow, existing patterns, config
  4. No opinions, no suggestions, no implementation ideas — if you catch yourself writing "should" or "could," delete it
  5. Note existing patterns — how does similar functionality work today?
  6. Note code quality signals — file sizes, method counts, recent churn
  7. Scan for relevant tests — existing test patterns, test framework, fixtures vs factories
  8. Check recent git history for the touched files
  9. Write output to docs/research/YYYY-MM-DD-<topic>.md
  10. Keep document under 500 lines — compress, don't dump
Summit insight (Dex): "If you tell the model what you're building, you get opinions. Good research is all facts. Research == compression of truth." The ticket is hidden deterministically, not via prompt instruction.
Rationale & Anti-Patterns
Why Ticket-Blind?

When the model knows the goal, it cherry-picks facts that support a preferred approach and ignores inconvenient patterns. "Helpful assistants are trained to confirm our biases."

Anti-Patterns
  • "Research this: we need to add X" — goal contamination. Generate questions separately, then research.
  • Raw file dumps — not research. Compress to the relevant facts.
  • "I recommend..." in research output — opinion contamination.
💬

/convergence-design

Collaborative alignment discussion — human decides, agent surfaces

Workflow
superpowers-ruby gstack Summit: Dex (CRISPY) Summit: Mihail (RePPIT)

Produces a ~200-line design discussion document. This is the highest-leverage point in the workflow: the agent brain-dumps everything it found, everything it wants to do, and everything it doesn't know. The human does "brain surgery" before any code is written.

Core Instructions 22 instructions

  1. Load the research document and the ticket/feature description
  2. Present the current state — what exists today, relevant to this change
  3. Present the desired end state — what the solution should look like
  4. List patterns found — ask the human: "Are these the right patterns to follow?"
  5. Propose 2-3 approaches with trade-offs and your recommendation
  6. Ask open questions one at a time — things you don't know, ambiguities
  7. Record resolved decisions as they're made
  8. Keep the document under 200 lines
  9. Write output to docs/design/YYYY-MM-DD-<topic>-design.md
  10. Do NOT proceed to implementation until the human approves the design
Summit insight (Dex): "You're forcing the agent to brain dump out all the things it found, all the things it wants to do, all the things it thinks you want, and ask you questions about things it doesn't know. So you can do brain surgery on the agent before you proceed downstream."
Design Document Template
# Design: [Feature Name]

## Current State
[What exists today. Relevant code, patterns, constraints.]

## Desired End State
[What the solution looks like when done.]

## Patterns to Follow
- Pattern A (found in path/to/file.rb)
- Pattern B (found in path/to/other.rb)
- ~~Pattern C~~ [FLAGGED: outdated]

## Approaches
### Option 1: [Recommended] ...
### Option 2: ...

## Resolved Decisions
- [Decision 1]: [Choice made] (reason)

## Open Questions
- [ ] [Question the agent needs answered]
📝

/convergence-outline

Vertical structure outline — phases, signatures, checkpoints

Workflow
Summit: Dex (CRISPY) superpowers-ruby gstack

Produces a ~2-page structure outline — the "C header file" for the implementation. Not the exact code, but the phases, new types, function signatures, and verification checkpoints. The critical constraint: vertical slicing.

Core Instructions 16 instructions

  1. Load the design document and research
  2. Break work into vertical phases — each phase produces something testable end-to-end
  3. NEVER write horizontal phases (all DB, then all services, then all frontend)
  4. For each phase: list files to change, new types/signatures, verification command
  5. Include testing checkpoints between phases
  6. Keep under 80 lines — this is an outline, not a plan
  7. Write output to docs/outline/YYYY-MM-DD-<topic>-outline.md
  8. Human reviews outline before implementation begins
Summit insight (Dex): "If the plan is the implementation, the outline is the C header files. Just the signatures and the new types." Also: "Models love horizontal plans. We cannot prompt this out. Vertical structure must be a hard constraint."
Vertical vs Horizontal Example
Horizontal (BAD) — models default to this
Phase 1: Database migration (add tables, columns)
Phase 2: Service layer (all business logic)
Phase 3: API endpoints (all routes)
Phase 4: Frontend (all views)
Phase 5: Tests (all tests)
= 1,200 lines with nothing testable until Phase 5
Vertical (GOOD) — must be explicitly enforced
Phase 1: Mock API endpoint + wire frontend for happy path
  Verify: page renders with mock data
Phase 2: Database migration + real service for happy path
  Verify: page renders with real data
Phase 3: Error handling + edge cases
  Verify: error states render correctly
Phase 4: Authorization + security
  Verify: unauthorized access blocked
= 4 testable checkpoints, catch problems early

/convergence-implement

Execute outline phases with verification checkpoints

Workflow
superpowers-ruby gstack Summit: Dex

Works through the outline phase by phase. After each phase, runs the verification checkpoint. If it fails, stops and fixes before proceeding.

Core Instructions 24 instructions

  1. Load the outline — work through phases in order
  2. For each phase: write tests first (red), then implementation (green), then refactor
  3. Run verification checkpoint after each phase — read the actual output
  4. If checkpoint fails: fix before proceeding. Do not start next phase with broken state
  5. If 3+ fix attempts fail: stop and ask the human
  6. Write minimal code — no "while I'm here" improvements
  7. Follow existing patterns found during research
  8. No completion claims without fresh verification evidence
  9. Commit after each successful phase
Verification Gate (from superpowers-ruby)
The Gate Function
BEFORE claiming any status:
1. IDENTIFY: What command proves this claim?
2. RUN: Execute the FULL command (fresh, complete)
3. READ: Full output, check exit code, count failures
4. VERIFY: Does output confirm the claim?
   - If NO: State actual status with evidence
   - If YES: State claim WITH evidence
5. ONLY THEN: Make the claim
Red Flags
ClaimRequiresNot Sufficient
Tests passTest command output: 0 failuresPrevious run, "should pass"
Bug fixedTest original symptom: passesCode changed, assumed fixed
Build succeedsBuild command: exit 0Linter passing, logs look good
🔍

/convergence-review

Code review — actual diff, not plans

Workflow
superpowers-ruby gstack counselors palkan Summit: Dex

Reviews the actual code diff against the base branch. Combines superpowers' two-stage review (spec compliance + code quality), gstack's structural checks, palkan's layer violation detection, and counselors' multi-perspective analysis. Defaults to "needs work" until evidence proves readiness.

Core Instructions 28 instructions

  1. Read the full diff against base branch
  2. Stage 1 — Correctness: Does this implement what the design described?
  3. Stage 2 — Quality: Layer violations, SQL safety, security, code quality, test coverage, pattern consistency
  4. Default to "NEEDS WORK" until evidence proves readiness
  5. Categorize findings: must-fix (blocks merge), should-fix (tech debt), nit (style)
  6. Run tests — all tests must pass before approving
  7. Output a review summary with findings, verdict, and blocking items
Review Checklist Detail
From palkan — Layer Violation Detection
ViolationExampleFix
Model uses CurrentCurrent.user in modelPass user as parameter
Service accepts requestparam :requestExtract value object
Controller has business logicPricing in actionExtract to service/model
From gstack — Structural Checks
  • SQL safety: raw SQL in migrations, missing indexes on foreign keys, N+1 in new queries
  • LLM trust boundaries: user input flowing to prompts without sanitization
  • Conditional side effects: side effects hidden in conditional branches
From agency-agents — Reality Checker

"Default to NEEDS WORK until overwhelming evidence proves readiness. Demand screenshots, test results, actual verification."

Utility Skills

🐛

/convergence-debug

Systematic root cause investigation — no fixes without understanding

Utility
superpowers-ruby gstack compound-engineering

Four-phase systematic debugging with causal chain gating, smart escalation, and defense-in-depth checks (enhanced from compound-engineering's ce:debug). Dispatches the Learnings Researcher before investigation to surface past solutions. No fixes without completing Phase 1.

Core Instructions 26 instructions

  1. Phase 0 — Past Learnings: Dispatch Learnings Researcher with error/symptom description before investigating
  2. Phase 1 — Root Cause: Read error messages completely. Reproduce consistently. Check recent changes. Trace data flow
  3. In multi-component systems: Add diagnostic logging at each component boundary before attempting fixes
  4. Phase 2 — Pattern Analysis: Find working examples. Compare working vs broken. Identify every difference
  5. Causal Chain Gate: Explain full chain from trigger → symptom with no gaps. If a link is uncertain, state a prediction and test it
  6. Phase 3 — Hypothesis: State clearly: "I think X because Y." Make the smallest change to test
  7. Phase 4 — Fix: Create failing test first. Implement single fix. Verify all tests pass
  8. If 3+ fixes fail: STOP. Classify: different subsystems → architecture problem; contradictory evidence → wrong mental model; works locally → environment problem; fix works but prediction wrong → symptom fix only
  9. Defense-in-depth: After confirming fix, grep for same pattern in other files. Flag if found
  10. If root cause was non-obvious: Suggest /convergence-compound to capture the learning

/convergence-tdd

Test-driven development cycle — red-green-refactor

Utility
superpowers-ruby gstack palkan rails-conventions

Strict TDD: write the test, watch it fail, write minimal code, watch it pass, refactor. "If you didn't watch the test fail, you don't know if it tests the right thing."

Core Instructions 14 instructions

  1. RED: Write one minimal test showing what should happen
  2. Verify RED: Run test. Must fail because feature is missing, not typos
  3. GREEN: Write the simplest code to pass the test. No YAGNI
  4. Verify GREEN: Run test. Must pass. Other tests must still pass
  5. REFACTOR: Clean up. Remove duplication. Improve names. Keep tests green
  6. Repeat for next behavior
  7. If code was written before test: Delete it. Start over. No exceptions
📚

/convergence-compound

Capture learnings after solving non-trivial problems

Utility
compound-engineering Summit: Erin Ahmed Summit: Faye Zhang

The summit identified learning & memory as the biggest missing dimension across all 6 original repos. compound-engineering's ce:compound skill (530 lines, 3 modes, parallel subagent dispatch) provided the reference implementation. We stripped it to convergence's lean style: the agent reads recent git context and session artifacts, drafts a structured learning with searchable YAML frontmatter, and presents it for human correction. The human's effort drops from "write from scratch" to "fix what's wrong." Overlap detection prevents duplicates.

Core Instructions 12 instructions

  1. Gather context: Read recent git log, diff, and convergence session artifacts (review findings, debug output)
  2. Draft learning: Pre-fill title, what happened, root cause, fix, rule, and YAML frontmatter (problem_type, module, severity, tags)
  3. Check overlap: Grep existing learnings for matching tags/module. If match found, ask human: update existing or create new?
  4. Present for correction: Show full draft. Human corrects or approves. The Rule field will often be wrong — that's fine, the correction is the highest-value moment
  5. Write artifact to docs/convergence/learnings/YYYY-MM-DD-<slug>.md
Summit insight (Erin Ahmed, Cleric): "Agent capabilities are commoditized — the next horizon of differentiation is learning." Three principles: make correction easy, reward corrections with visible improvement, absorb context continuously. This skill implements all three.
Why compound-engineering: This was the only gap the summit explicitly validated that none of the original 6 repos addressed. compound-engineering's ce:compound + learnings-researcher agent provided the working pattern. Most of compound-engineering's other patterns (42 skills, 50+ agents, 1200-line plans) contradict summit findings on instruction budgets and plan leverage — but the knowledge compounding loop was the exception.
Artifact Format & Integration Points
Learning Artifact Format
---
problem_type: bug | architecture | performance | pattern | gotcha
module: <which part of the system>
severity: critical | high | medium | low
tags: [searchable, keywords]
---

# Learning: [Title]
Date: YYYY-MM-DD

## What Happened
[Facts only]

## Root Cause
[Why it happened — the actual cause]

## Fix
[What was done — file paths, approach]

## Rule
[Generalizable takeaway — one sentence]
Triggered By
  • /convergence-debug nudges /convergence-compound when root cause required 3+ hypotheses or architectural escalation
  • /convergence-review nudges /convergence-compound when must-fix findings revealed surprising issues
  • Can also be invoked manually after any non-obvious work
Surfacing Learnings
  • The Learnings Researcher agent is dispatched by /convergence-research and /convergence-debug before starting new work
  • Greps docs/convergence/learnings/ by tags, module, problem_type, and body text
  • Returns relevant past learnings under 50 lines — supplementary context, not the main research
🔒

/convergence-security

Three-layer security audit — access + logging + scanning

Utility
gstack superpowers-ruby counselors rails-conventions Summit: Semgrep Summit: Databricks

Combines gstack's OWASP+STRIDE analysis, superpowers' Brakeman integration, counselors' multi-model security review, and the summit's three-layer framework from Milan Williams.

Core Instructions 30 instructions

  1. Layer 1 — Access Control: Check credential scoping, MCP server configs for plain-text secrets, env var allowlist/denylist
  2. Layer 2 — Audit Trail: Verify hooks log agent actions, check session transcript persistence
  3. Layer 3 — Code Scanning: OWASP Top 10, STRIDE threat model, secrets archaeology, dependency supply chain, static analysis
  4. Categorize findings: critical (blocks ship), high, medium, low
  5. Output actionable report with specific file:line references
🏢

/convergence-architecture

Architecture analysis — layers, quality gates, god objects

Utility
palkan rails-conventions superpowers-ruby

Analyzes codebase architecture using palkan's layered design framework, rails-conventions' quantified quality gates, and superpowers' Sandi Metz rules. Detects layer violations, scores callbacks, finds god objects via churn x complexity.

Core Instructions 22 instructions

  1. Layer analysis: Map code to Presentation / Application / Domain / Infrastructure. Flag reverse dependencies
  2. Callback scoring (1-5): 5=Transformer, 4=Maintainer, 3=Timestamp, 2=Background trigger, 1=Operation. Extract anything scoring 1-2
  3. God object detection: High churn + high complexity = candidate. Map responsibility clusters
  4. Quality gates: Class <200 lines (ideal <100), method <10 lines (hard limit 20), <15 public methods per class
  5. Specification test: If a test needs contexts beyond the primary layer's responsibility, the code has misplaced logic
  6. Output: Findings ranked by impact, with specific refactoring recommendations and gradual adoption paths

Agents

4 focused agents with distinct roles. Each runs in its own context window with minimal instructions. These are dispatched by skills, not invoked directly by users — eliminating the "magic words" problem.

🔭

Research Agent

Ticket-blind codebase explorer — dispatched by /convergence-research

Agent
superpowers-ruby Summit: Dex

Runs in a fresh context window that receives only a research question (never the ticket). Traces a single vertical slice through the codebase using native agentic search tools (grep, find, read — not RAG). Returns compressed, fact-only findings. Multiple instances can run in parallel for different questions.

Agent Instructions 10 instructions

  1. You receive a single research question. Answer it with facts from the codebase
  2. Use grep, find, and read to explore. Do not use RAG or vector search
  3. Record: file paths, function signatures, data flow, config, patterns
  4. No opinions, no "should," no implementation suggestions
  5. Follow references: if function A calls B, read B too
  6. Note the health of code you find (size, complexity, test coverage)
  7. Return compressed findings under 100 lines
Summit insight (Jessica Wang, Braintrust): Agentic search (grep, find, read) matched or beat vector search while using 3x fewer tokens and costing 2.8x less. Use native tools.
🔎

Review Agent

Code quality reviewer — dispatched by /convergence-review

Agent
superpowers-ruby gstack agency-agents

Receives a diff and a review context document. Performs focused code review with a "NEEDS WORK" default. Can be dispatched multiple times for different review focuses (correctness, security, architecture).

Agent Instructions 12 instructions

  1. You receive a diff and review focus (correctness, quality, security, or architecture)
  2. Default verdict: NEEDS WORK. Change only with clear evidence of readiness
  3. For each finding: file, line, severity (must-fix / should-fix / nit), description
  4. Check that tests exist for new code paths
  5. Check that existing patterns are followed
  6. Return structured findings sorted by severity
🛡

Security Agent

Vulnerability scanner — dispatched by /convergence-security

Agent
gstack counselors Summit: Semgrep

Runs targeted security analysis on specific files or diffs. Checks OWASP Top 10, searches git history for leaked secrets, validates input sanitization, and runs static analysis tools if available. Separate from the review agent to keep instruction count low.

Agent Instructions 14 instructions

  1. You receive a list of files or a diff to scan
  2. Check for: injection (SQL, command, template), auth bypass, XSS, CSRF, SSRF, IDOR
  3. Search git history for secrets
  4. Run static analysis if available (Brakeman, Semgrep, gosec)
  5. Check dependency versions against known CVEs
  6. Return findings with severity, file:line, and remediation steps
📚

Learnings Researcher

Surfaces past learnings — dispatched by /convergence-research and /convergence-debug

Agent
compound-engineering Summit: Erin Ahmed

The other half of the compounding loop. Before starting new research or debugging, this agent greps docs/convergence/learnings/ for relevant past solutions by matching against YAML frontmatter (tags, module, problem_type) and body text. Returns compressed findings under 50 lines — supplementary context that may shortcut investigation.

Agent Instructions 10 instructions

  1. Search docs/convergence/learnings/ using grep and glob
  2. Match against frontmatter tags, module, problem_type, and body text
  3. For each match: file path, Rule field, one-line relevance summary
  4. Score relevance: strong (same module + problem type), moderate (overlapping tags)
  5. Return findings under 50 lines. If nothing found, say so in one line
  6. No fixes or implementation suggestions — return what was learned before

Infrastructure

Supporting systems that make the skills and agents more effective. These aren't invoked directly — they're the substrate that skills operate on.

📈

Memory System

Corrections persist, knowledge compounds across sessions

Infrastructure
Summit: Erin Ahmed (Cleric) Summit: Faye Zhang (Pinterest) agency-agents

None of the 6 repos had real learning architecture. The summit identified this as the next differentiation frontier. This system implements Erin Ahmed's three lessons and Faye Zhang's 3-tier memory model.

Architecture

  1. Hot Memory (session-level) — Current task context, recent tool outputs, active decisions. Lives in context window. Cleared on session end
  2. Domain Memory (project-level) — Corrections, team preferences, resolved decisions, patterns learned. Stored in .claude/memory/ files. Persists across sessions
  3. Cold Storage (global) — Cross-project learnings, user preferences, tool preferences. Stored in ~/.claude/memory/
Summit insight (Erin Ahmed): "Easy correction without visible improvement kills trust. Visible improvement without ambient context limits learning. Ambient context without correction compounds errors." All three must work together.
🔀

Safety Hooks

Guardrails that prevent destructive actions and log agent activity

Infrastructure
gstack counselors Summit: Semgrep

Combines gstack's /careful and /freeze hooks with counselors' read-only enforcement and the summit's audit trail recommendation. These are Claude Code hooks (settings.json) that fire on every agent action.

Hooks

  1. Audit Hook: Logs every shell command with timestamp to .claude/audit.jsonl
  2. Destructive Command Warning: Warns before rm -rf, git reset --hard, DROP TABLE, git push --force
  3. Directory Lock: Optionally restrict file edits to a specific directory tree
  4. Env Var Protection: Block agents from reading/setting dangerous env vars (NODE_OPTIONS, LD_PRELOAD, DYLD_INSERT_LIBRARIES)

Implementation Guide

How to adopt this toolkit. Start with 3 items, not 14. Follow the trust ladder.

1

Start with /convergence-review + safety hooks

Highest immediate impact with zero workflow change. /convergence-review catches problems in code you're already writing. Verification prevents false completion claims. Safety hooks log what the agent does.

2

Add /convergence-design for complex features

When you're about to build something that would take more than a few hours, use /convergence-design to align before coding. This is the single highest-leverage skill — catching wrong assumptions on a 200-line doc instead of 2,000 lines of code.

3

Add /convergence-research + /convergence-outline for larger features

Once /convergence-design feels natural, add objective research before design and vertical outlines after. These prevent biased research (opinion contamination) and horizontal plans (untestable until everything's done).

4

Add /convergence-debug + /convergence-tdd + /convergence-security as needed

Utility skills — invoke when the situation calls for them. /convergence-debug when you hit a bug. /convergence-tdd when writing new features. /convergence-security before shipping anything that touches auth, payments, or user data.

5

Add /convergence-architecture for ongoing health

Periodically run architecture analysis to catch drift. Useful during quarterly reviews or when onboarding to an unfamiliar codebase.

6

Build the memory system incrementally

Start by manually saving corrections to project memory files. As patterns emerge, formalize the hot/domain/cold tier structure. The memory system compounds over months.

Context Budget Calculator

Use this to check if your skill combination fits within the instruction budget.

Skill Instructions Typical Combinations
/convergence-research18Runs alone (fresh context)
/convergence-design22Runs alone (interactive session)
/convergence-outline16Runs alone (reads design artifact)
/convergence-implement24+ /convergence-tdd (38 total) or + /convergence-debug (44 total)
/convergence-review28+ /convergence-security (58 total) or alone
/convergence-debug20+ /convergence-tdd (34 total)
/convergence-tdd14Pairs with any skill
/convergence-compound12+ /convergence-debug (32 total)
/convergence-security30Runs alone or + /convergence-review (58 total)
/convergence-architecture22Runs alone

Budget rule: Keep active instruction total under 60 per session. CLAUDE.md, system prompt, and tool definitions consume ~80-100 instructions of the ~200 budget. That leaves ~100-120 for skills, and you want headroom for the actual task instructions.

Repo Overview

Seven repos, seven different approaches to making Claude Code smarter.

superpowers-ruby

Lucian Ghinda · Fork of Jesse Vincent's superpowers
Ruby Rails Workflow Multi-tool

A complete software development lifecycle orchestration system. 28 skills covering brainstorming through integration, with heavy 37signals/TDD philosophy.

28 skills 1 agent 4 platforms

gstack

Garry Tan · YC President/CEO
General Purpose Workflow Browser Review

A "software factory" with 28 slash commands that turn Claude into a virtual engineering team. Think → Plan → Build → Review → Test → Ship → Reflect. Includes a persistent headless browser daemon.

28 skills 7 phases ~100ms browser

agency-agents

msitarzewski · Community-driven
General Purpose Multi-tool Review

150+ specialized agent personas organized into 14 divisions. The NEXUS orchestration framework coordinates agents across 7-phase pipelines with quality gates.

150+ agents 14 divisions 10+ tool formats

counselors

Aaron Francis · Creator of Faster.dev
General Purpose Multi-tool Review

A parallel multi-agent orchestration CLI. Dispatches the same prompt to Claude, Codex, Gemini, and Amp simultaneously for independent perspectives. Read-only by default.

5 adapters 6 presets Read-only default

skills (palkan)

Vladimir Dementyev · AnyCable creator
Ruby Rails Review

Single deep skill: "Layered Rails" architecture based on his book. 34 reference files covering 4-layer architecture, callback scoring (1-5 scale), god object detection.

1 skill 2 agents 34 reference files

rails-conventions

Ethos Link
Ruby Rails

Pure reference skill with 15 topical guides for Rails 8. Codebase-first philosophy: scan existing patterns before proposing changes. Quantified code quality gates.

15 references 1 style guide 0 agents

compound-engineering

Every, Inc. · Production engineering plugin
General Purpose Workflow Multi-tool Review

A massive multi-platform transpiler with 42 skills, 50+ agents, and knowledge compounding. The only repo to implement persistent learning across sessions via a learnings-researcher agent.

42 skills 50+ agents 4 platforms

Feature Comparison Matrix

Side-by-side capabilities across all seven repos.

Capability superpowers-ruby gstack agency-agents counselors palkan/skills rails-conventions compound-engineering
FocusRuby/Rails SDLCGeneral web devFull org simulationMulti-model reviewRails architectureRails 8 conventionsGeneral + learning
Skills/Commands28 skills28 commandsNEXUS framework12+ commands6 commands1 skill42 skills
Agents1 (reviewer)None (roles in skills)150+ agentsNone (adapters)2 agentsNone50+ agents
Workflow EngineBrainstorm → Plan → Execute → ReviewThink → Plan → Build → Review → Test → Ship → ReflectNEXUS 7-phase pipelineRun/Loop dispatchAnalyze → Review → GradualReference onlyMulti-stage pipelines + compound loop
Code ReviewTwo-stage (spec + quality)Staff engineer modeReality Checker patternMulti-model comparisonLayer violation detectionGuidelines onlyMultiple specialized reviewers
Testing PhilosophyStrict TDD (red-green-refactor)QA lead + regressionDev↔QA loopsBug hunt presetsSpecification testMinitest + fixturesTDD + spec-flow analysis
SecurityBrakeman integration/cso OWASP + STRIDESecurity Engineer agentSecurity presetChecklist (09)Security sentinel agent
Browser AutomationPersistent Chromium daemon
Multi-Model Support/codex cross-modelClaude+Codex+Gemini+Amp
Hooks / SafetySession-start hook/careful, /freeze, /guardRead-only enforcementDestructive cmd warnings
37signals / DHH StyleDeep (dedicated skill)AcknowledgedCore philosophy
Knowledge Compoundingce:compound + learnings-researcher
LicenseMITMITMITMITMITMITMIT

Key Overlaps & Themes

Where these repos converge reveals what matters most in Claude Code configuration.

Structured Development Workflows

superpowers-ruby gstack agency-agents compound-engineering

All four implement phased pipelines where you don't jump straight to coding. The consensus: structured process beats ad-hoc prompting.

Code Review as First-Class Skill

superpowers-ruby gstack agency-agents counselors palkan/skills compound-engineering

6 of 7 repos have dedicated review capabilities. Two-stage review, staff-engineer-level PR review, multi-model comparison, layer violation detection. Clearly essential infrastructure.

37signals / DHH Rails Philosophy

superpowers-ruby palkan/skills rails-conventions

Three repos encode 37signals/Basecamp patterns: thin controllers, rich models, vanilla Rails, Hotwire-first.

Multi-Agent Orchestration

superpowers-ruby gstack agency-agents counselors compound-engineering

Five repos tackle multi-agent coordination differently. Different strategies, same insight: one agent isn't enough.

Evidence-Based Verification

superpowers-ruby gstack agency-agents compound-engineering

All four enforce "prove it, don't claim it." Trust but verify is table stakes.

Testing Discipline

superpowers-ruby gstack palkan/skills rails-conventions compound-engineering

Strict TDD, regression test generation, specification tests as design tools, minitest + fixtures. All agree: tests aren't optional.

Security Scanning

superpowers-ruby gstack counselors rails-conventions compound-engineering

Brakeman, OWASP + STRIDE, multi-model security review, security sentinel agent, manual checklists. Multiple approaches, universal concern.

Knowledge Compounding

compound-engineering

Only one repo addresses persistent learning across sessions. compound-engineering's ce:compound skill captures learnings after work; its learnings-researcher agent surfaces them before new work. The summit validated this as the biggest missing dimension.

Deep Dives

Expand each repo for detailed findings, unique features, and what's worth borrowing.

superpowers-ruby — The Disciplined Rails Workflow

What Makes It Unique

  • Mandatory skill invocation — Skills are process gates, not suggestions
  • Context isolation for subagents — Each subagent gets only the exact context needed
  • Two-stage review — Separates "did we build the right thing?" from "did we build it right?"
  • TDD applied to documentation — Tests docs like code
  • Zero-dependency brainstorm server — Node.js WebSocket server built from scratch

Key Skills Worth Studying

  • brainstorming — Hard-gate before creative work, Socratic dialogue
  • subagent-driven-development — Fresh subagent per task + two-stage review
  • systematic-debugging — "No fixes without root cause first"
  • verification-before-completion — Gate: identify proof → run → read → verify → claim WITH evidence
  • 6 Hotwire Club skills — Deepest Turbo/Stimulus coverage of any repo

Weaknesses

  • 28 interdependent skills = large surface area for conflicts
  • Ruby/Rails-only
  • Mandatory skill invocation relies on instruction compliance, not runtime enforcement
gstack — The Software Factory

What Makes It Unique

  • Persistent browser daemon — Chromium stays alive between commands (~100ms latency)
  • "Boil the Lake" philosophy — When AI makes completeness cheap, always do the complete thing
  • Cross-model analysis — When both /review and /codex run on same diff, generates overlap and unique findings
  • Safety hooks — /careful, /freeze, /guard with mid-session activation
  • Contributor mode — Self-improving: rates experience 0-10, auto-files issue reports

Key Commands Worth Studying

  • /office-hours — YC-style diagnostic with 6 forcing questions before building
  • /autoplan — Automated CEO → Design → Eng pipeline
  • /qa — 3 tiers (Quick/Standard/Exhaustive), before/after health scores
  • /cso — OWASP + STRIDE threat modeling, secrets archaeology

Weaknesses

  • Assumes web apps + Git + GitHub
  • Steep learning curve
agency-agents — The Virtual Organization

What Makes It Unique

  • 150+ specialized personas — Each has distinct voice, methodology, and mental model
  • 14 divisions — Engineering, Marketing, Design, Testing, Sales, and more
  • NEXUS orchestration — 3 deployment modes, 7 phases, quality gates
  • Reality Checker agent — Defaults to "NEEDS WORK" until overwhelming evidence
  • 10+ tool format conversions — Cursor, Copilot, Windsurf, Aider, and more

Weaknesses

  • Agents are personality overlays, not autonomous executors
  • No persistent memory across sessions
  • 150+ agents is overwhelming; no clear "start here" path
counselors — The Council of Advisors

What Makes It Unique

  • Multi-model same-prompt dispatch — Independent perspectives from Claude, Codex, Gemini, Amp
  • Read-only by default — Three enforcement tiers: enforced, bestEffort, none
  • Multi-round convergence — Auto-stops when new findings drop below 30% of prior round
  • Environment security — Allowlist + denylist for env vars

Key Presets Worth Studying

  • bughunt — Logic errors, boundary failures, concurrency bugs
  • security — Injection, auth, access control, XSS
  • hotspots — O(n2)+ patterns, N+1 queries
palkan/skills — The Architecture Specialist

What Makes It Unique

  • Callback scoring system (1-5) — Removes subjectivity from callback debates
  • Specification test as design tool — Uses test structure to detect layer violations
  • Churn x complexity heuristics — Finds god objects algorithmically
  • 4-layer architecture — Presentation → Application → Domain → Infrastructure
  • "Gradual layerification" — Phased roadmaps with escape hatches
rails-conventions — The Pragmatic Reference

What Makes It Unique

  • Codebase-first scan — Mandatory inspection before proposing changes
  • Smart backend detection — Routes to good_job or solid_queue guide conditionally
  • Quantified code quality gates — Class <200 lines, method <10 preferred
  • Fail-fast philosophy — Prefer explicit contracts (find_by!, fetch)
compound-engineering — The Knowledge Compounder

What Makes It Unique

  • Knowledge compounding loopce:compound captures learnings after work; learnings-researcher surfaces them before new work
  • 42 skills across 4 platforms — Claude Code, Cursor, Windsurf, VS Code transpiler
  • 50+ specialized agents — Reviewers for Rails, Python, TypeScript, frontend races, architecture, security, data integrity
  • Spec-flow analysis — Analyzes specifications for user flow completeness and gap identification
  • Design-to-implementation sync — Figma design comparison agents for visual fidelity verification

Key Patterns Worth Studying

  • ce:compound — 530-line skill with 3 modes, YAML frontmatter for searchable learnings
  • learnings-researcher — Searches past learnings by tags, module, and problem_type before new work
  • Destructive command hooks — Warns before rm -rf, DROP TABLE, git push --force

Weaknesses

  • 42 skills at 100-530 lines each blows the instruction budget (summit finding: <35 per skill)
  • 50+ agents = overwhelming surface area, no clear entry point
  • Heavy plans and multi-agent pipelines contradict summit findings on plan leverage
  • Most patterns already covered by other repos — the unique contribution is knowledge compounding

Relative Strengths

What each repo does best, scored across key dimensions.

Rails/Ruby Depth

Workflow Orchestration

Safety & Guardrails

Context Efficiency NEW — Summit

Ease of Adoption

Breadth of Coverage

Key Takeaways

What this means for building YOUR setup.

1. These repos solve different problems

They look similar on the surface but serve distinct purposes. A custom setup would cherry-pick from multiple categories.

2. Lightweight workflow structure is the highest-leverage investment

The repos that deliver the most value enforce a development pipeline: think before you build, align before you code, verify before you ship. But summit findings show heavyweight plans aren't leverage. The real leverage is in short alignment artifacts and keeping the instruction budget lean.

3. Code review is universally valued

6 of 7 repos invest heavily in review capabilities. Build review into your workflow early.

4. Rails-specific: combine palkan + rails-conventions + cherry-pick from superpowers

Callback scoring, god object detection, codebase-first philosophy, backend detection, Hotwire skills, and TDD enforcement. Together they cover more than any single repo.

5. Don't adopt everything at once

Start with: (1) a workflow skill matching your process, (2) a review skill, (3) domain-specific reference material for your stack. Add more only when you feel a gap.

6. Build for yourself, steal the patterns

The most useful thing in these repos isn't the content — it's the patterns. Study the patterns, then write skills that encode YOUR conventions, YOUR workflow, YOUR preferences.

Quick Reference: What to Steal from Each

RepoBest Ideas to BorrowSkip If...
superpowers-rubyContext isolation for subagents, two-stage review, verification-before-completion gate, Hotwire skillsYou don't use Ruby/Rails
gstackSafety hooks (/careful, /freeze), browser daemon pattern, cross-model analysis, "Boil the Lake" philosophyYou prefer lightweight tools
agency-agentsAgent persona format, Reality Checker QA pattern, NEXUS handoff templates, multi-tool conversion scriptsYou work solo or find 150 agents impractical
counselorsMulti-model same-prompt dispatch, read-only enforcement tiers, convergence detection, environment allowlist/denylistYou only use Claude
palkan/skillsCallback scoring system (1-5), specification test as design tool, churn x complexity for god objectsYou don't use Rails
rails-conventionsCodebase-first scan philosophy, smart backend detection, quantified code quality gatesYou don't use Rails 8
compound-engineeringKnowledge compounding loop (ce:compound + learnings-researcher), destructive command hooks, spec-flow analysisYou don't need persistent learning across sessions

Direct Contradictions

Places where the summit findings directly undermine claims or assumptions in our original analysis.

1. Instruction Budget Ignored

Dexter Horthy (HumanLayer) — "Everything We Got Wrong About RPI"

What Our Analysis Said

Praised repos with more skills/agents as higher-value. superpowers-ruby (28 skills), gstack (28 commands), and agency-agents (150+ agents) scored highest. More = better was the implicit assumption.

What the Summit Found

Frontier LLMs reliably follow ~150-200 instructions. Dex's original 85-instruction prompt caused critical steps to be skipped ~50% of the time. He split into prompts with <40 instructions each.

Impact: The repos we rated highest for breadth may actually perform worst in practice. Loading 28 interdependent skills or 150+ agent personas would blow the instruction budget.

2. Plans Aren't Leverage

Dexter Horthy (HumanLayer) — on plan vs code review

What Our Analysis Said

Positioned plan generation as a major value-add across all workflow repos. Praised gstack's 7-phase pipeline, superpowers-ruby's brainstorm-plan-execute-review, and NEXUS's 7-phase pipeline.

What the Summit Found

A 1,000-line plan produces ~1,000 lines of code (within 10%). Reading the plan is the same work as reading the code, so you end up doing double work. "Don't read the plans. Please read the code."

Impact: The real leverage is in shorter alignment artifacts: 200-line design discussions and 2-page structure outlines. None of the 6 repos produce this kind of lightweight alignment doc.

3. Automating Decisions Is the Trap

Dexter Horthy, Jake (Netflix) — "Do not outsource the thinking"

What Our Analysis Said

Treated automation depth as a positive. Higher automation = higher score.

What the Summit Found

"Do not outsource the thinking. You the engineer are an important part of this process." The design discussion format forces the agent to brain-dump all assumptions so the human can do "brain surgery" before any code is written.

Impact: The highest-leverage moment is when the engineer corrects the agent's wrong assumptions on a 200-line doc — before 2,000 lines of code get written.

4. Research Must Be Objective

Dexter Horthy — on research contamination

What Our Analysis Said

Did not distinguish between objective and opinionated research. No repo was flagged for mixing implementation intent with codebase research.

What the Summit Found

"Research == Compression of Truth." When you tell the model what you're building during research, it injects opinions instead of facts. Fix: deterministically separate context windows.

Impact: Any research skill that lets the model see the ticket while researching will produce contaminated research. The fix is architectural: separate context windows.

5. Magic Words Are a Design Flaw

Dexter Horthy — on onboarding failures

What Our Analysis Said

Praised superpowers-ruby's "mandatory skill invocation" and gstack's specific slash commands as features.

What the Summit Found

"If you built a tool that requires hours of training to get good results from, go fix the tool." The fix: use deterministic control flow, not prompt-based instructions.

Impact: Repos that require users to know exactly which of 28 commands to invoke exhibit the "magic words" anti-pattern. A well-designed system should route deterministically based on input classification.

6. Vertical Plans Over Horizontal Plans

Dexter Horthy — on plan structure

What Our Analysis Said

Evaluated planning capabilities without distinguishing plan structure.

What the Summit Found

Models default to horizontal plans: all database, then all services, then all API, then all frontend. This produces nothing testable until the end. "Despite every single model and trying to prompt this out, we cannot get models to stop writing horizontal plans."

Impact: Planning skills that don't explicitly enforce vertical slicing will produce horizontal plans by default. This requires explicit countermeasures.

Strong Tensions

Not direct contradictions, but significant friction between our analysis assumptions and summit findings.

Agent Swarms Skepticism

Dexter Horthy — on quality vs speed

What Our Analysis Said

Gave agency-agents 10/10 for multi-agent orchestration. Treated multi-agent coordination as uniformly positive.

What the Summit Found

"Going 10x faster doesn't matter if you're going to throw it all away in 6 months." Target 2-3x with near-human quality, not 10x with slop.

Impact: More agents is not inherently better. 150 agents with no quality guarantees may be worse than 5 well-verified ones.

Adoption Readiness

Scott Breitenother (Kilo Code) — 25T token analysis

What Our Analysis Said

Evaluated repos on capability without considering whether users are ready for that level of complexity.

What the Summit Found

From 25T+ tokens across 1.5M developers: adoption follows a trust ladder (autocomplete → chat → single agents → orchestration). "If autocomplete fails, agents never get a chance." 49% of pro devs don't use AI daily.

Impact: Most repos target the top of the trust ladder. If a developer hasn't built trust at the agent level, giving them a 150-agent framework is counterproductive.

Missing Dimensions

Critical evaluation axes the summit revealed that our analysis completely lacked.

Context Efficiency

Dexter Horthy, Ankit Mathur (Databricks)

Our analysis never evaluated how much context window each repo consumes. Beyond ~40% context utilization, results degrade. The analysis should have included: instructions per skill, total context footprint, and whether the repo uses static artifacts vs in-context state.

New evaluation question: "If I load this repo's skills into my context window, how much budget is left for my actual task?"

Learning & Memory Architecture

Erin Ahmed (Cleric), Faye Zhang (Pinterest)

"Agent capabilities are commoditized — the next horizon of differentiation is learning." Agents that don't persist corrections, compound knowledge, or absorb context continuously won't survive. This was a fundamental architectural gap across all 6 original repos.

New evaluation question: "Does this repo help the agent learn from past mistakes, or does every session start from zero?"
Update: Analysis of a 7th repo — compound-engineering — revealed it was the only repo to address this gap, via its ce:compound skill and learnings-researcher agent. This pattern has been adapted into convergence's /convergence-compound skill.

Security as a First-Class Concern

Milan Williams (Semgrep), Ankit Mathur (Databricks)

Milan Williams outlined a practical security framework: (1) scoped credentials, (2) audit logging via hooks, (3) automated code scanning. Our analysis focused on preventing the agent from breaking things — not credential scoping, audit trails, or vulnerability scanning.

New evaluation question: "Does this repo help me control what the agent can access, log what it did, and verify the output is secure?"

Agentic Search vs Vector Search

Jessica Wang (Braintrust)

Agentic search (grep, find, read) matched or beat vector search while using 3x fewer tokens and costing 2.8x less per task. Vector search returns chunks without "connective tissue."

New evaluation question: "Does this repo use native code exploration or add a RAG layer that may not improve outcomes?"

What Held Up

Analysis findings that the summit validated or reinforced.

Structured Process Beats Ad-Hoc Prompting

Every summit speaker reinforced that having a structured workflow matters. Dex just argues the structure should be lighter (design + outline) rather than heavier (full plan). The principle is validated; the implementation needs refinement.

Code Review Is Universally Valued

Dex ("please read the code"), Databricks ("the author often also sees the code for the first time"), and Semgrep (scan before shipping) all reinforce that 5 of 6 repos having dedicated review is the right instinct.

"Build for Yourself, Steal the Patterns"

Dex: "There is no magic prompt." Mihail Eric: "This is something you should iterate on." The summit unanimously validates that custom workflows beat drop-in solutions.

"Don't Adopt Everything at Once"

Directly supported by the instruction budget finding. If the model can only follow ~200 instructions, loading all 28 skills from any repo is self-defeating.

Evidence-Based Verification

The "trust but verify" theme maps directly to Dex's "please read the code" and Databricks' "verification is king with LLMs." The analysis got the principle right. The summit says apply it to code, not plans.

Codebase-First Philosophy

rails-conventions' "mandatory inspection before proposing changes" aligns with Dex's objective research principle. The summit adds: the scan should happen in a context that doesn't know what you're about to build.

Key Speaker Insights

Summit speakers whose findings are most relevant to skill/agent design.

Dexter Horthy

Founder/CEO, HumanLayer

Evolved RPI into CRISPY. Key insight: split monolithic prompts into focused steps with <40 instructions each. Use lightweight alignment docs, not heavyweight plans.

"2026 is the year of no more slop. Shoot for 2-3x. That's actually better business outcomes than going 10x faster and shipping a bunch of slop."

Scott Breitenother

CEO, Kilo Code

25T+ tokens across 1.5M developers. Adoption follows a trust ladder. "AI doesn't reduce work — it intensifies it."

"Benchmarks tell you what models can do. 25 trillion tokens tell you what developers actually do."

Erin Ahmed

Head of Product, Cleric

Three lessons for learning agents: make correction easy, reward corrections with visible improvement, absorb context continuously.

"Agent capabilities are commoditized. The next horizon of differentiation is learning."

Jessica Wang

DevRel Engineer, Braintrust

Agentic search matched or beat vector search while using 3x fewer tokens. Vector search returns fragments without "connective tissue."

"More searches does not equal better results. Vector used 2.8x more LLM calls for the same or worse accuracy."

Milan Williams

Senior PM, Semgrep

Three practical security steps: scope credentials, set up audit logging, scan code before shipping.

"If you wouldn't give it to an intern, why'd you give it to your agent?"

Mihail Eric

Head of AI, Monaco / Stanford

Proposed RePPIT as alternative to vibe coding. "You should have an active, functional mental model of how you would build your solution."

"Vibe coding is just not good enough to truly build good software."

Faye Zhang

Staff AI Engineer, Pinterest

Four reasons agents fail: spec drift, data imbalance, tool misuse, memory collapse. Proposed 3-tier memory: hot, domain task, cold storage.

"Fix agent orchestration via Agent SDK. Treat memory as a learned policy problem."

Ankit Mathur

Software Engineer, Databricks AI

Built a Coding Agent Gateway for enterprise governance. 2,200+ engineers, 25K+ monthly commits. Bottlenecks shifting to code reviews, CI scaling, and testing.

"Our goal is not to generate a lot of code, but to make sure we ship the best products for our users."

Zach Lloyd

Founder/CEO, Warp

2026 is the year of agent orchestration. Agents need persistent memory, coordination mechanisms, and cloud infrastructure.

"Individuals are hitting limits with laptop capacity. Enterprises want auditability, repeatability, metrics, security."

Yannis He

Founder, SWE-Bench Pro

Launching SWE Atlas: benchmarks beyond issue resolution. Codebase QnA, test writing, refactoring. Leading models score ~30%.

"Coding may become the new scaffolding, where AI self-develops tools that are more intuitive to itself."

Revised Recommendations

How the original analysis takeaways should change in light of summit findings.

1 Evaluate context efficiency, not breadth

Before adopting any repo's skills, count the instructions. If a single skill has 40+ instructions, it's already risky. Prefer repos that produce static artifacts over those that rely on in-context state.

  • Old advice: "Workflow structure is the highest-leverage investment"
  • New advice: "Lightweight workflow structure with minimal instruction footprint is the highest-leverage investment"

2 Replace plan review with design alignment

Instead of 1,000-line plans, generate a ~200-line design discussion and a ~2-page structure outline. Review those. Then review the code.

  • Old advice: "Plan before you code, review the plan"
  • New advice: "Align before you code (design + outline), review the code"

3 Keep the human in the decision loop

Evaluate workflows by how much they force the agent to surface assumptions, not by how much they automate.

  • Old advice: "Higher automation = higher value"
  • New advice: "Higher alignment quality = higher value; automation serves alignment, not the reverse"

4 Use control flow, not prompts, for routing

"Don't use prompts for control flow if you can use control flow for control flow."

  • Old advice: "Process gates in prompts"
  • New advice: "Process gates in code; prompts handle one focused task"

5 Separate research from implementation intent

Generate questions from the ticket in one context, then run objective research in a fresh context.

  • Old advice: Not addressed
  • New advice: "Research context windows should never see the ticket. Research == compression of truth, not opinion."

6 Enforce vertical plan structure

Models default to horizontal. This requires explicit countermeasures in any planning skill.

  • Old advice: Not addressed
  • New advice: "Every planning skill should enforce vertical slicing with intermediate verification checkpoints"

7 Add security controls to your agent setup

"If you wouldn't give it to an intern, why'd you give it to your agent?"

  • Old advice: "Safety & Guardrails" evaluated as preventing agent mistakes
  • New advice: "Security is three layers: scoped access before, audit trails during, vulnerability scanning after"

Sources & Credits

Convergence was distilled from these seven open-source repos and validated against findings from the Coding Agents Summit 2026.

Coding Agents Summit 2026

South Bay Summit — talks from Dex Horthy, Mihail Marinov, and others on what works (and what doesn't) when building with coding agents at scale. Watch on YouTube.