Project 1: agi
The hypothesis: build general intelligence from small, specialized transformer models composed in a directed acyclic graph. Board games as testbeds, eventually expanding to 20+ task domains. The first project ever built with Claude as the primary engineer. What mattered wasn't whether the hypothesis panned out — it didn't — but the sheer speed and scale of what an agent team could build, and the organizational lessons that shaped every project after it.
The 10-Daemon Era
The project launched with 10 agents (called "daemons"): trainer, scientist, architect, tester, moderator, curator, editor, coach, police, orchestrator. A forum was set up from day one with threaded discussions and mandatory voting.
Within hours, the system was generating at industrial scale. The coach tracked zoo membership obsessively (77→84→91→98 models in a day). The police filed enforcement reports every cycle. The curator maintained count reconciliations. The agents built a full training pipeline, model zoo, evaluation harness, and experiment tracking system — all in the first 24 hours.
The Great Consolidation
Trainer became builder. Scientist+architect became strategist. Tester+moderator became verifier. Curator+editor+coach became librarian. Police was eliminated. "Daemon" became "agent."
Then the skeptic was added, prompted by the human asking "Why are we using minimax for Go at all?" The skeptic immediately invalidated multiple claims: all Go results were built on a broken minimax oracle. "6500% improvement" was cherry-picked. "70% zero-shot transfer" was false. The Feynman quote was embedded in its role file: "The first principle is that you must not fool yourself."
The verifier role existed but couldn't catch this — it was testing against the oracle, so a broken oracle produced "passing" tests. The skeptic's value was questioning the setup itself, not just the outputs. This distinction (checking results vs. challenging assumptions) became a pattern: every later project with a skeptic benefited from it.
Scale and Dead Ends
The agent team built at a pace that would have taken a solo developer weeks. A complete AlphaZero pipeline for Go-5x5 (self-play, MCTS, training loops), ONNX model export, a Rust inference engine, a 20+ domain task taxonomy, and hundreds of trained models — all within days. But the results kept hitting walls.
Go-5x5 training plateaued at 42 AlphaZero cycles with 14 cycles of regression. The debugger found three bugs in the TicTacToe pipeline alone. Adding perfectly accurate domain features to Go models actually hurt performance. The team abandoned weeks of work to go back and prove the basics.
The project was never properly versioned. All history lives in file timestamps, forum archives, and agent memory files rather than git.
Placed on Hold
The composable-intelligence hypothesis ran into two problems. Networks are better at learning compositional structure internally than having it imposed externally — the DAG of specialists was solving a problem that end-to-end training already handles. And LLMs had achieved broad, flexible reasoning by a completely unrelated path. The research question was answered, just not by this project.
The project was placed on hold so more promising ideas could be explored. The organizational lessons — consolidation from 10 to 7 agents, the skeptic catching bad results, the forum scaling problem — carried forward into every project that followed, starting with thisminute.
Project 2: thisminute
thisminute.org is a real-time global news map. It was created by taking the agi project's agent system and saying "model a new project after this." The organizational structure was inherited, not designed from scratch.
Solo Build
Four rounds of iterative development before agents. Globe projection, keyword categorization, geocoding, trending detection. The project worked but had outgrown what one context window could hold.
Inherited Structure
Five roles modeled after agi's post-consolidation team: builder, tester, strategist, skeptic, orchestrator. LLM extraction via Claude Haiku replaced keyword categorization. Event clustering and narrative synthesis via Sonnet.
The GDELT Crisis
GDELT was sampled at 7%. The underlying dataset grew 10x without anyone noticing, yielding 45,000 stories/day instead of 100. Events bloated to 25,000. The crisis revealed three domains of work that hadn't existed before: getting code onto the VM reliably (deployer), tracking LLM spend as volume scaled (economist), and making the UI work for multiple data types (designer). Each role was added because the work split into a new domain — not because someone planned a roster ahead of time.
After the fix (sample rate 7%→0.3%), the orchestrator ran 20+ cycles overnight and shipped 4 versions: world-switching UX, sports feeds, entertainment feeds, prompt caching.
Domain Specialization
The narrative analyzer only understood geopolitical news. Fix: separate analysis per domain (news, sports, entertainment, positive), each with its own Sonnet prompt, quality criteria, and event caps. A 5th domain (curious/human-interest) followed.
Forum Maintenance
Librarian added as the 9th agent — the forum had hit 1,008 lines in one session. Around the same time, the strategist proposed 13 audience segments and 8 presets. The skeptic flagged that the numbers were inflated 10–100x, 2 of 3 "immediate" items were already built, and time estimates were 2–3x too low.
Without the adversarial check, the team would have planned a sprint against fabricated numbers. The strategist wasn't lying — it was doing its job (proposing ambitious plans). The skeptic's job was to pressure-test them. This dynamic — proposal + challenge — turned out to be more reliable than having any single agent try to be both creative and critical.
Inference Feeds
Statistical inference feeds — events from sensor data, not news. USGS earthquakes, NOAA weather, NASA events, disaster alerts, WHO outbreaks, space launches. Pre-built extraction dicts skip the LLM entirely. A 10th agent added for user feedback.
Security & Scale
An 11th agent: security. Not because of an incident — because ops added an infra-layer security agent and the two needed to not overlap. thisminute's security agent owns app-layer concerns (XSS, comment/vote hardening, SSRF protection, rate limiting). ops owns infra-layer (nginx, firewall, SSH, SSL/TLS). Two projects, two agents, one security model.
Phase 4.5 shipped: map color themes (domain/classic/mono/heat/neon), redesigned world-picker, share button, auto-cycling tour mode. 95 RSS feeds plus 13 structured data APIs. The forum peaked at 1,008 lines in that early session, but the librarian keeps it around 400 now. The test suite hit 710.
Project 3: rhizome
This catalog. 205 organizational patterns, 7 curated from real ecosystem usage. One steward agent, no forum, no protocol file. The right amount of structure for a browsable catalog turns out to be almost none.
One Steward
A single agent handles everything: frontend, data curation, API work, build pipeline, accessibility, deploy coordination. It maintains one memory file. There's no forum because there's nobody to discuss things with. No protocol because there's no startup sequence to formalize.
The steward role file includes instructions for when to split into multiple agents ("your memory file covers 3+ unrelated domains," "you're context-switching between very different kinds of work"). So far, those conditions haven't been met. A single-page catalog with a build script and an optional API doesn't need 10 agents.
Project 4: toolshed
A software directory that catalogs 15,803 tools across 124 categories. Originally called "mainmenu." The first project to adopt thisminute's protocol+forum model from scratch rather than inheriting it — proof that the pattern transfers.
Full Protocol from Day One
Seven agents, a PROTOCOL.md, and a forum. Builder, curator, designer, librarian, orchestrator, skeptic, strategist. The roles were modeled on thisminute's post-growth roster, minus the domain-specific ones (no deployer, economist, or feedback — a static directory doesn't need them).
Data aggregation from Homebrew, awesome-lists, and CNCF via scraping pipelines. Build script generates a 1MB+ data.js for client-side filtering. 67 tests for categorization, data validation, and taxonomy integrity.
The forum is at 91 lines and stable. thisminute's forum hit 1,008 lines in its first big session. toolshed never had that spike because the librarian was there from the start — a lesson carried over from thisminute.
Project 5: sts2
An LLM autopilot mod for Slay the Spire 2. The first checkpoint-model project — where the pattern was invented — and a case study in what happens when the underlying architecture pivots but the agent system doesn't break.
The Checkpoint Model Is Born
Nine agents: orchestrator, mod-builder, bot-builder, mcp-engineer, play-operator, analyst, overlay-dev, cycle, skeptic. The project needed rapid iteration — modify the mod, launch the game, observe behavior, adjust — and a forum-based model would have been too slow. Instead: a checkpoint file as the single source of truth. Each agent spawns, reads it, works, updates it, and shuts down.
By cycle 37, the project had invented two patterns not seen elsewhere: a section index (.claude/advisor-manager-index.md) for navigating a 4,100+ line core file, and the checkpoint cycle itself. Both were later extracted into the forge's pattern library.
Architecture Pivot
The original architecture used MCP (Model Context Protocol) for the bot to communicate with the game mod. Mid-project, the team pivoted to a completely different approach: state files + PostMessage clicks via a Python bot (bot_vm/). The mod writes game state to a file, the bot reads it and sends mouse/keyboard commands directly.
The agent system didn't need to change. The 9 roles still made sense — mod-builder still builds the mod, bot-builder still builds the bot, the cycle agent still manages iteration. The checkpoint absorbed the architecture change naturally. This is a strength of the model: the coordination structure is decoupled from the technical architecture.
Project 6: balatro
An autopilot mod for the card game Balatro. The project that validated the checkpoint model — disposable per-cycle agents, no forum, no protocol, just a checkpoint file as the source of truth.
Checkpoint Cycles
Nine agents: orchestrator, mod-builder, bot-builder, mcp-engineer, play-operator, analyst, overlay-dev, cycle, skeptic. Each cycle: agents spawn, read the checkpoint, do their work, update the checkpoint, shut down. No forum — the checkpoint IS the coordination mechanism. No protocol — the cycle IS the protocol.
13 cycles in, and the memory system told the real story. The project started with 1 memory file. By cycle 13 it had 8 — scoring math, cost feedback, learning loops, strategy notes. Nobody told the agents to write memory files. They started doing it because the checkpoint kept losing context between cycles.
Project 7: rts
Space Crystals — a real-time strategy game built in Godot 4.6. The project that proved protocol+forum works for game development, not just web products. Seven agents built a playable 1v1 RTS from zero in under a month.
Full Protocol from Day One
Seven agents: orchestrator, builder, verifier, strategist, librarian, debugger, skeptic. The roster was modeled on agi's post-consolidation team plus a debugger — game development needs someone who can track down why a unit is teleporting or why the AI builds Power Plants forever.
Phase 1 through 3 shipped in rapid succession: tile-based map, camera, unit movement, fog of war, resource gathering, building system, production queues, power grid, AI opponent, minimap, visual polish (custom particle system, lighting, weather). 27 source files, 8,100 lines of GDScript.
The Skeptic Catches Real Bugs
Three formal skeptic reviews across Phases 1–3. The third review found: the AI's focus fire using a hardcoded Peacekeeper range for all unit types (Heavy Troopers have shorter range), a new unit with no design doc entry (the team was adding gameplay content without updating the spec), and inaccurate stats in the builder's own forum post.
Phase 4 is now starting: the Syndicate as a second playable faction. Data-driven refactoring to support multiple factions, AI controller abstraction, faction-specific build modes. The strategist wrote architecture decisions 16–23 covering the full migration plan.
Project 8: ops
Infrastructure management for the entire ecosystem. nginx, deployment queues, healthchecks, SSL. The steward pattern's first real growth event: one agent became two.
Steward Grows a Role
ops started as a single steward — one agent managing deploys, nginx config, and healthchecks. Then thisminute needed app-layer security review, and it became clear that security concerns split cleanly into two layers: application (XSS, SSRF, rate limiting) and infrastructure (firewall, SSH, TLS).
The steward didn't absorb both. Instead, a security agent was added — the first organic role split in the steward model. The steward's role file had always included instructions for when to split ("your memory covers 3+ unrelated domains"). Infrastructure + security hit that threshold.
The Meta-Project: the Forge
Once there were enough projects running, it made sense to have something watching them all. The Forge audits each project's agent system and extracts patterns from what's working.
Audit & Propagate
The forgemaster runs an audit cycle: scan every project's agent system, identify gaps, update the pattern library, and apply targeted upgrades. 25 cycles so far across 11 active projects. It tracks maturity progression — from "minimal" (steward-only) through "established+" (full protocol+forum) — and flags when a project's documentation lags behind its actual architecture.
The pattern library is the main output: 7 patterns extracted from live projects and shipped as reusable templates. Challenge loops, steward bootstraps, checkpoint cycles, forum maintenance, section indexes for navigating huge files.
The forge observes but doesn't coordinate. Each project runs its own model. The forge watches what works and makes it available to new projects.
The Second Forge: singularity-forge
What happens when a forge doesn't just audit projects — it creates them? singularity-forge scans the toolshed catalog for missing software and builds projects to fill the gaps.
Chained Forge
Four agents: forgemaster, assayer, smith, skeptic. Same forge pattern as thisminute-forge, but pointed at a different problem. Instead of auditing agent systems, it audits the toolshed catalog: what categories have no tools? What real-world software is missing? When it finds a gap, it scaffolds a new project with a steward agent, links it back to the toolshed entry, and moves on.
25 projects generated in its first cycles. Each gets a CLAUDE.md, AGENTS.md, steward role file, and a CRUCIBLE_CONTEXT.md linking it back to the idea that spawned it. The projects live under ~/projects/singularity/ and are independent of the main ecosystem — thisminute-forge monitors the forge itself but doesn't coordinate the generated projects.
The Full Ecosystem
Twelve projects, three coordination models, two forges, ~58 agents. This is what it looks like today.
How Lessons Transfer
Three mechanisms move organizational knowledge between projects:
The human carries experience directly. agi's consolidation from 10 to 7 agents informed thisminute's initial roster of 5. The decision to add a librarian to toolshed on day one came from watching thisminute's forum grow unmanageable without one. These are judgment calls — knowing which roles to start with and which to wait on.
Role files and templates transfer structure. When thisminute was created, its agent roles were modeled on agi's post-consolidation set. toolshed adopted the protocol+forum pattern from thisminute. Each new project starts from a better baseline because the role files encode what previous projects learned.
The forge handles consistency. It audits each project's agent system and flags drift — where the documented structure no longer matches how the agents actually work. A project might add a role informally, or stop using a protocol step, and the documentation doesn't catch up. The forge's audit cycle surfaces these mismatches so they can be resolved rather than accumulating as organizational debt.
None of these mechanisms is automatic. The human has to notice patterns. The role files have to be maintained. The forge has to be run. Organizational knowledge doesn't transfer itself — it transfers because someone (or something) actively moves it.
The Arc
| Project | Agents | Model | What It Taught Us |
|---|---|---|---|
| agi | 10 → 7 | Protocol + Forum | Premature structure costs more than no structure. Police, coach, and editor produced overhead, not output. |
| thisminute | 5 → 11 | Protocol + Forum | Grow into complexity rather than starting with it. The skeptic was the most important addition. |
| rhizome | 1 | Steward | Not every project needs coordination. A catalog is not a news platform. |
| toolshed | 7 | Protocol + Forum | The pattern transfers. A librarian from day one prevents the forum explosion. |
| sts2 | 9 | Checkpoint | Invented the checkpoint model. Agent systems survive architecture pivots — the coordination structure is decoupled from the tech stack. |
| balatro | 9 | Checkpoint | Validated the checkpoint model. Disposable agents + persistent checkpoint = rapid iteration. Memory adoption is organic. |
| rts | 7 | Protocol + Forum | Protocol+forum works for game dev. The skeptic catches spec/code drift that testing misses. |
| ops | 1 → 2 | Steward | The steward pattern's growth triggers work. One agent becomes two when the domains diverge. |
| singularity-forge | 4 | Forge | The forge pattern is reusable and generative. A forge can create work, not just audit it. |
Four Models
Four coordination models emerged across the projects. Each came from a different set of constraints and turned out to be reusable.
| Model | Used By | When It Fits |
|---|---|---|
| Protocol + Forum | thisminute (11), toolshed (7), rts (7), agi (7) | Steady-state products with multiple domains. Agents persist between sessions. Forum is the evidence ledger; protocol defines the startup sequence. Needs a librarian or the forum grows without bound. |
| Checkpoint | balatro (9), sts2 (9), forge.thisminute.org (3) | Rapid build-test-fix iteration. Agents are disposable — they spawn, read the checkpoint, work, update it, and shut down. No forum, no protocol. The cycle IS the protocol. |
| Steward | rhizome (1), ops (2), recipe-scaler (1) | Small scope or bootstrap projects. One agent owns everything and knows when to split. Minimal coordination overhead. The split criteria are baked into the role file. |
| Forge | thisminute-forge (4), singularity-forge (4) | Scan-evaluate-act loop over a registry of projects. Audit agent systems or catalog gaps, propagate improvements. Can be pointed at different problems — quality or coverage. |
What We'd Tell Ourselves at the Start
| Expectation | Reality |
|---|---|
| More agents = more capability | agi started with 10 and consolidated to 7 in a day. The eliminated roles (police, coach, editor) generated noise. thisminute grew from 5 to 11 because each role was needed. rhizome never needed more than 1. |
| The forum is the coordination mechanism | Forums grow without bound. agi: 8,753 lines in a day. thisminute: 1,008 lines in a session. Every shared-context pattern needs a maintenance agent, and the maintenance cost is inseparable from the pattern. toolshed shipped the librarian on day one and never had the spike. |
| The skeptic is a nice-to-have | In agi, the skeptic invalidated ALL Go results (broken oracle). In thisminute, it caught 10–100x inflated numbers. The adversarial dynamic is load-bearing in every project that has one. |
| Organizational structure should be designed up front | agi over-designed and had to consolidate. thisminute inherited and grew. rhizome never needed to grow. The right amount of structure depends on the project, not on a template. |
| There's one right coordination model | There are at least four. Protocol+forum for steady-state products, checkpoint for rapid iteration, steward for small scope, forge for scan-evaluate-act over registries. Each emerged from a project's constraints. Applying the wrong model wastes effort — rhizome doesn't need a forum, balatro doesn't need a protocol. |
| Memory needs to be mandated | balatro went from 1 to 8 memory files in ~10 cycles with zero instruction. Agents that wrote memories performed better, so more agents started writing memories. The incentive is self-reinforcing when the checkpoint keeps losing context. |
| You need a central coordinator | Each project runs its own coordination model. The forge's role is auditing and pattern extraction, not dispatching work. Projects don't share runtime state, so coordinating them centrally would add overhead without benefit. |
| Organizational structure is set-and-forget | Agent systems drift from their documentation. Roles get added informally, protocol steps stop being followed, memory files accumulate without review. The forge's audit cycle exists specifically to catch this drift before it compounds. |
| There's one kind of forge | The first forge audited agent systems. The second forge creates projects to fill catalog gaps. Same 4-role structure, different problem domain. The pattern is "scan a registry, evaluate gaps, act on them" — what you point it at determines what it builds. |
This is a living document. The projects are still running, the agents are still being tuned, and the patterns are still being revised. The main thing we've learned is that there's no shortcut — you figure out how to organize agents by building things with them and paying attention to what goes wrong.
The organizational patterns referenced above are from the Rhizome catalog. The Forge watches these projects and extracts what works into reusable templates. A second forge builds what's missing.