Project 1: agi

The hypothesis: build general intelligence from small, specialized transformer models composed in a directed acyclic graph. Board games as testbeds, eventually expanding to 20+ task domains. The first project ever built with Claude as the primary engineer. What mattered wasn't whether the hypothesis panned out — it didn't — but the sheer speed and scale of what an agent team could build, and the organizational lessons that shaped every project after it.

Feb 1, 2026

The 10-Daemon Era

The project launched with 10 agents (called "daemons"): trainer, scientist, architect, tester, moderator, curator, editor, coach, police, orchestrator. A forum was set up from day one with threaded discussions and mandatory voting.

Within hours, the system was generating at industrial scale. The coach tracked zoo membership obsessively (77→84→91→98 models in a day). The police filed enforcement reports every cycle. The curator maintained count reconciliations. The agents built a full training pipeline, model zoo, evaluation harness, and experiment tracking system — all in the first 24 hours.

orchestrator trainer scientist architect tester moderator curator editor coach police
What broke: The four failed roles (police, coach, editor, curator) were all oversight functions — auditing, reviewing, enforcing — added before there was anything to oversee. The lesson: monitoring roles need something to monitor. The forum hit 8,753 lines in one day, mostly these roles talking to each other. Consolidated to 6 agents that evening.
10
daemons
8,753
forum lines (1 day)
106
zoo members
Feb 2, evening

The Great Consolidation

Trainer became builder. Scientist+architect became strategist. Tester+moderator became verifier. Curator+editor+coach became librarian. Police was eliminated. "Daemon" became "agent."

orchestrator builder strategist verifier librarian

Then the skeptic was added, prompted by the human asking "Why are we using minimax for Go at all?" The skeptic immediately invalidated multiple claims: all Go results were built on a broken minimax oracle. "6500% improvement" was cherry-picked. "70% zero-shot transfer" was false. The Feynman quote was embedded in its role file: "The first principle is that you must not fool yourself."

The verifier role existed but couldn't catch this — it was testing against the oracle, so a broken oracle produced "passing" tests. The skeptic's value was questioning the setup itself, not just the outputs. This distinction (checking results vs. challenging assumptions) became a pattern: every later project with a skeptic benefited from it.

orchestrator builder strategist verifier librarian skeptic debugger
Feb 3–10

Scale and Dead Ends

The agent team built at a pace that would have taken a solo developer weeks. A complete AlphaZero pipeline for Go-5x5 (self-play, MCTS, training loops), ONNX model export, a Rust inference engine, a 20+ domain task taxonomy, and hundreds of trained models — all within days. But the results kept hitting walls.

Go-5x5 training plateaued at 42 AlphaZero cycles with 14 cycles of regression. The debugger found three bugs in the TicTacToe pipeline alone. Adding perfectly accurate domain features to Go models actually hurt performance. The team abandoned weeks of work to go back and prove the basics.

7
agents
100+
trained models
20+
task domains
1
git commit

The project was never properly versioned. All history lives in file timestamps, forum archives, and agent memory files rather than git.

Feb 10+

Placed on Hold

The composable-intelligence hypothesis ran into two problems. Networks are better at learning compositional structure internally than having it imposed externally — the DAG of specialists was solving a problem that end-to-end training already handles. And LLMs had achieved broad, flexible reasoning by a completely unrelated path. The research question was answered, just not by this project.

The project was placed on hold so more promising ideas could be explored. The organizational lessons — consolidation from 10 to 7 agents, the skeptic catching bad results, the forum scaling problem — carried forward into every project that followed, starting with thisminute.

Rhizome pattern: Meta-Learning Zoo — specialized sub-groups coordinating through shared experiment logs. The adversarial dynamic (skeptic invalidating claims) was load-bearing from the start.

Project 2: thisminute

thisminute.org is a real-time global news map. It was created by taking the agi project's agent system and saying "model a new project after this." The organizational structure was inherited, not designed from scratch.

Early March

Solo Build

Four rounds of iterative development before agents. Globe projection, keyword categorization, geocoding, trending detection. The project worked but had outgrown what one context window could hold.

542
stories
16
RSS sources
17
tests
Week 1

Inherited Structure

Five roles modeled after agi's post-consolidation team: builder, tester, strategist, skeptic, orchestrator. LLM extraction via Claude Haiku replaced keyword categorization. Event clustering and narrative synthesis via Sonnet.

orchestrator builder tester strategist skeptic
Week 1, day 5

The GDELT Crisis

GDELT was sampled at 7%. The underlying dataset grew 10x without anyone noticing, yielding 45,000 stories/day instead of 100. Events bloated to 25,000. The crisis revealed three domains of work that hadn't existed before: getting code onto the VM reliably (deployer), tracking LLM spend as volume scaled (economist), and making the UI work for multiple data types (designer). Each role was added because the work split into a new domain — not because someone planned a roster ahead of time.

orchestrator builder tester strategist skeptic deployer economist designer

After the fix (sample rate 7%→0.3%), the orchestrator ran 20+ cycles overnight and shipped 4 versions: world-switching UX, sports feeds, entertainment feeds, prompt caching.

45K→4.2K
stories/day
60%→98%
extraction rate
74
RSS feeds
Week 2

Domain Specialization

The narrative analyzer only understood geopolitical news. Fix: separate analysis per domain (news, sports, entertainment, positive), each with its own Sonnet prompt, quality criteria, and event caps. A 5th domain (curious/human-interest) followed.

Week 2, day 3

Forum Maintenance

Librarian added as the 9th agent — the forum had hit 1,008 lines in one session. Around the same time, the strategist proposed 13 audience segments and 8 presets. The skeptic flagged that the numbers were inflated 10–100x, 2 of 3 "immediate" items were already built, and time estimates were 2–3x too low.

Without the adversarial check, the team would have planned a sprint against fabricated numbers. The strategist wasn't lying — it was doing its job (proposing ambitious plans). The skeptic's job was to pressure-test them. This dynamic — proposal + challenge — turned out to be more reliable than having any single agent try to be both creative and critical.

orchestrator builder tester strategist skeptic deployer economist designer librarian
Week 2, day 5

Inference Feeds

Statistical inference feeds — events from sensor data, not news. USGS earthquakes, NOAA weather, NASA events, disaster alerts, WHO outbreaks, space launches. Pre-built extraction dicts skip the LLM entirely. A 10th agent added for user feedback.

orchestrator builder tester strategist skeptic deployer economist designer librarian feedback
10
agents
9
data sources
351
tests
$10.92
per day (LLM)
Rhizome patterns: Orchestrated (hub-and-spoke) + Pipeline (scrape→extract→cluster→analyze) + Federated (domain autonomy, source adapters) + Adversarial (skeptic review loops).
Week 3+

Security & Scale

An 11th agent: security. Not because of an incident — because ops added an infra-layer security agent and the two needed to not overlap. thisminute's security agent owns app-layer concerns (XSS, comment/vote hardening, SSRF protection, rate limiting). ops owns infra-layer (nginx, firewall, SSH, SSL/TLS). Two projects, two agents, one security model.

orchestrator builder tester strategist skeptic deployer economist designer librarian feedback security

Phase 4.5 shipped: map color themes (domain/classic/mono/heat/neon), redesigned world-picker, share button, auto-cycling tour mode. 95 RSS feeds plus 13 structured data APIs. The forum peaked at 1,008 lines in that early session, but the librarian keeps it around 400 now. The test suite hit 710.

11
agents
95
RSS feeds
710
tests
444
forum lines (steady)

Project 3: rhizome

This catalog. 205 organizational patterns, 7 curated from real ecosystem usage. One steward agent, no forum, no protocol file. The right amount of structure for a browsable catalog turns out to be almost none.

March, week 2

One Steward

A single agent handles everything: frontend, data curation, API work, build pipeline, accessibility, deploy coordination. It maintains one memory file. There's no forum because there's nobody to discuss things with. No protocol because there's no startup sequence to formalize.

steward

The steward role file includes instructions for when to split into multiple agents ("your memory file covers 3+ unrelated domains," "you're context-switching between very different kinds of work"). So far, those conditions haven't been met. A single-page catalog with a build script and an optional API doesn't need 10 agents.

1
agent
205
patterns
92
tests
0
forum lines
No coordination pattern needed. 7 of its catalog entries come from watching the other projects organize themselves.

Project 4: toolshed

A software directory that catalogs 15,803 tools across 124 categories. Originally called "mainmenu." The first project to adopt thisminute's protocol+forum model from scratch rather than inheriting it — proof that the pattern transfers.

March, week 2

Full Protocol from Day One

Seven agents, a PROTOCOL.md, and a forum. Builder, curator, designer, librarian, orchestrator, skeptic, strategist. The roles were modeled on thisminute's post-growth roster, minus the domain-specific ones (no deployer, economist, or feedback — a static directory doesn't need them).

orchestrator builder curator designer strategist skeptic librarian

Data aggregation from Homebrew, awesome-lists, and CNCF via scraping pipelines. Build script generates a 1MB+ data.js for client-side filtering. 67 tests for categorization, data validation, and taxonomy integrity.

7
agents
15,803
tools cataloged
124
categories
91
forum lines (healthy)

The forum is at 91 lines and stable. thisminute's forum hit 1,008 lines in its first big session. toolshed never had that spike because the librarian was there from the start — a lesson carried over from thisminute.

Rhizome patterns: Orchestrated (no single orchestrator, but protocol-mediated) + Adversarial (skeptic reviews strategy proposals). The protocol+forum model is now a documented, transferable pattern.

Project 5: sts2

An LLM autopilot mod for Slay the Spire 2. The first checkpoint-model project — where the pattern was invented — and a case study in what happens when the underlying architecture pivots but the agent system doesn't break.

Late February

The Checkpoint Model Is Born

Nine agents: orchestrator, mod-builder, bot-builder, mcp-engineer, play-operator, analyst, overlay-dev, cycle, skeptic. The project needed rapid iteration — modify the mod, launch the game, observe behavior, adjust — and a forum-based model would have been too slow. Instead: a checkpoint file as the single source of truth. Each agent spawns, reads it, works, updates it, and shuts down.

orchestrator mod-builder bot-builder mcp-engineer play-operator analyst overlay-dev cycle skeptic

By cycle 37, the project had invented two patterns not seen elsewhere: a section index (.claude/advisor-manager-index.md) for navigating a 4,100+ line core file, and the checkpoint cycle itself. Both were later extracted into the forge's pattern library.

March, week 2

Architecture Pivot

The original architecture used MCP (Model Context Protocol) for the bot to communicate with the game mod. Mid-project, the team pivoted to a completely different approach: state files + PostMessage clicks via a Python bot (bot_vm/). The mod writes game state to a file, the bot reads it and sends mouse/keyboard commands directly.

The agent system didn't need to change. The 9 roles still made sense — mod-builder still builds the mod, bot-builder still builds the bot, the cycle agent still manages iteration. The checkpoint absorbed the architecture change naturally. This is a strength of the model: the coordination structure is decoupled from the technical architecture.

What broke: AGENTS.md still describes the old MCP architecture. The pivot happened faster than documentation could keep up. The forge flagged this drift but deferred the fix — updating docs during an active architecture pivot creates churn.
9
agents
37
cycles
3
memory files
4,100+
line core file
sts2 invented two patterns: the checkpoint cycle (disposable agents + persistent state) and the section index (a table of contents for navigating huge files). Both were later adopted by balatro and extracted into forge templates.

Project 6: balatro

An autopilot mod for the card game Balatro. The project that validated the checkpoint model — disposable per-cycle agents, no forum, no protocol, just a checkpoint file as the source of truth.

March, week 1

Checkpoint Cycles

Nine agents: orchestrator, mod-builder, bot-builder, mcp-engineer, play-operator, analyst, overlay-dev, cycle, skeptic. Each cycle: agents spawn, read the checkpoint, do their work, update the checkpoint, shut down. No forum — the checkpoint IS the coordination mechanism. No protocol — the cycle IS the protocol.

orchestrator mod-builder bot-builder mcp-engineer play-operator analyst overlay-dev cycle skeptic

13 cycles in, and the memory system told the real story. The project started with 1 memory file. By cycle 13 it had 8 — scoring math, cost feedback, learning loops, strategy notes. Nobody told the agents to write memory files. They started doing it because the checkpoint kept losing context between cycles.

What surprised: The checkpoint model deliberately makes agents disposable — they spawn, work, and shut down each cycle. But the checkpoint itself was too narrow to carry all the context agents needed. Agents started writing memory files as a workaround: scoring math that shouldn't change between cycles, cost feedback, strategy notes. The agents adapted around a limitation of their own coordination model. 1→8 memory files in ~10 cycles.
9
agents
13
cycles
8
memory files
25/25
MCP tests
The checkpoint model is being validated here. Target: cycle 100 before formalizing the pattern into the Forge template library.

Project 7: rts

Space Crystals — a real-time strategy game built in Godot 4.6. The project that proved protocol+forum works for game development, not just web products. Seven agents built a playable 1v1 RTS from zero in under a month.

Late February

Full Protocol from Day One

Seven agents: orchestrator, builder, verifier, strategist, librarian, debugger, skeptic. The roster was modeled on agi's post-consolidation team plus a debugger — game development needs someone who can track down why a unit is teleporting or why the AI builds Power Plants forever.

orchestrator builder verifier strategist librarian debugger skeptic

Phase 1 through 3 shipped in rapid succession: tile-based map, camera, unit movement, fog of war, resource gathering, building system, production queues, power grid, AI opponent, minimap, visual polish (custom particle system, lighting, weather). 27 source files, 8,100 lines of GDScript.

7
agents
8,100
lines of GDScript
27
source files
3
skeptic reviews
March, week 3

The Skeptic Catches Real Bugs

Three formal skeptic reviews across Phases 1–3. The third review found: the AI's focus fire using a hardcoded Peacekeeper range for all unit types (Heavy Troopers have shorter range), a new unit with no design doc entry (the team was adding gameplay content without updating the spec), and inaccurate stats in the builder's own forum post.

Phase 4 is now starting: the Syndicate as a second playable faction. Data-driven refactoring to support multiple factions, AI controller abstraction, faction-specific build modes. The strategist wrote architecture decisions 16–23 covering the full migration plan.

What broke: The forum hit 1,178 lines — the largest in the ecosystem. The librarian existed but was never spawned for maintenance. The forge archived 18 threads of completed Phase 1–3 history, keeping only the active Phase 4 plan and the skeptic's open review. Same pattern seen in thisminute and toolshed before it.
Protocol+forum transfers to game development. The skeptic's value is magnified: spec/code mismatches in games are harder to catch because "it looks right when you play it" masks numerical errors. Three reviews caught bugs that headless testing missed.

Project 8: ops

Infrastructure management for the entire ecosystem. nginx, deployment queues, healthchecks, SSL. The steward pattern's first real growth event: one agent became two.

March, week 1

Steward Grows a Role

ops started as a single steward — one agent managing deploys, nginx config, and healthchecks. Then thisminute needed app-layer security review, and it became clear that security concerns split cleanly into two layers: application (XSS, SSRF, rate limiting) and infrastructure (firewall, SSH, TLS).

The steward didn't absorb both. Instead, a security agent was added — the first organic role split in the steward model. The steward's role file had always included instructions for when to split ("your memory covers 3+ unrelated domains"). Infrastructure + security hit that threshold.

steward security
2
agents
102
deploy queue lines
The split criteria were in the steward's role file from day one — they just hadn't been triggered until the security work created a second domain.

The Meta-Project: the Forge

Once there were enough projects running, it made sense to have something watching them all. The Forge audits each project's agent system and extracts patterns from what's working.

March, week 2

Audit & Propagate

The forgemaster runs an audit cycle: scan every project's agent system, identify gaps, update the pattern library, and apply targeted upgrades. 25 cycles so far across 11 active projects. It tracks maturity progression — from "minimal" (steward-only) through "established+" (full protocol+forum) — and flags when a project's documentation lags behind its actual architecture.

The pattern library is the main output: 7 patterns extracted from live projects and shipped as reusable templates. Challenge loops, steward bootstraps, checkpoint cycles, forum maintenance, section indexes for navigating huge files.

25
audit cycles
11
projects monitored
7
validated patterns
~58
total agents

The forge observes but doesn't coordinate. Each project runs its own model. The forge watches what works and makes it available to new projects.

The Second Forge: singularity-forge

What happens when a forge doesn't just audit projects — it creates them? singularity-forge scans the toolshed catalog for missing software and builds projects to fill the gaps.

March, week 3

Chained Forge

Four agents: forgemaster, assayer, smith, skeptic. Same forge pattern as thisminute-forge, but pointed at a different problem. Instead of auditing agent systems, it audits the toolshed catalog: what categories have no tools? What real-world software is missing? When it finds a gap, it scaffolds a new project with a steward agent, links it back to the toolshed entry, and moves on.

forgemaster assayer smith skeptic

25 projects generated in its first cycles. Each gets a CLAUDE.md, AGENTS.md, steward role file, and a CRUCIBLE_CONTEXT.md linking it back to the idea that spawned it. The projects live under ~/projects/singularity/ and are independent of the main ecosystem — thisminute-forge monitors the forge itself but doesn't coordinate the generated projects.

4
agents
25
projects created
What's interesting: This is the first forge that generates work rather than just auditing it. The audit-and-propagate pattern scales in a new direction: instead of improving existing projects, it identifies what should exist and bootstraps it. The two forges have cleanly separate registries — thisminute-forge monitors the ecosystem, singularity-forge monitors its own catalog projects.
The forge pattern is reusable. You can point it at different problems — agent system quality, software catalog gaps, potentially anything with a scan-evaluate-act loop. The 4-role structure (forgemaster, assayer, smith, keeper/skeptic) transfers directly.

The Full Ecosystem

Twelve projects, three coordination models, two forges, ~58 agents. This is what it looks like today.

thisminute-forge 25 cycles · 11 projects agent-forge (template) singularity-forge 25 projects created scans toolshed PROTOCOL + FORUM thisminute 11 agents · news platform toolshed 7 agents · software directory rts 7 agents · Godot strategy game 25 agents CHECKPOINT balatro 9 agents · card game autopilot sts2 9 agents · game autopilot (pivot) forge.thisminute.org 3 agents · portal hub 21 agents STEWARD ops 2 agents · infrastructure rhizome 1 agent · pattern catalog recipe-scaler 1 agent · singularity project 4 agents forge.thisminute.org contains toolshed + rhizome (3 agent systems, 1 git repo) 25 generated projects + 8 forge agents (2 forges) + 4 templates 58 agents across 12 projects · 3 coordination models · 2 forges + agi (on hold, 7 agents) — where it all started Protocol+Forum Checkpoint Steward Oversight / Template

How Lessons Transfer

Three mechanisms move organizational knowledge between projects:

The human carries experience directly. agi's consolidation from 10 to 7 agents informed thisminute's initial roster of 5. The decision to add a librarian to toolshed on day one came from watching thisminute's forum grow unmanageable without one. These are judgment calls — knowing which roles to start with and which to wait on.

Role files and templates transfer structure. When thisminute was created, its agent roles were modeled on agi's post-consolidation set. toolshed adopted the protocol+forum pattern from thisminute. Each new project starts from a better baseline because the role files encode what previous projects learned.

The forge handles consistency. It audits each project's agent system and flags drift — where the documented structure no longer matches how the agents actually work. A project might add a role informally, or stop using a protocol step, and the documentation doesn't catch up. The forge's audit cycle surfaces these mismatches so they can be resolved rather than accumulating as organizational debt.

None of these mechanisms is automatic. The human has to notice patterns. The role files have to be maintained. The forge has to be run. Organizational knowledge doesn't transfer itself — it transfers because someone (or something) actively moves it.

The Arc

Project Agents Model What It Taught Us
agi10 → 7Protocol + ForumPremature structure costs more than no structure. Police, coach, and editor produced overhead, not output.
thisminute5 → 11Protocol + ForumGrow into complexity rather than starting with it. The skeptic was the most important addition.
rhizome1StewardNot every project needs coordination. A catalog is not a news platform.
toolshed7Protocol + ForumThe pattern transfers. A librarian from day one prevents the forum explosion.
sts29CheckpointInvented the checkpoint model. Agent systems survive architecture pivots — the coordination structure is decoupled from the tech stack.
balatro9CheckpointValidated the checkpoint model. Disposable agents + persistent checkpoint = rapid iteration. Memory adoption is organic.
rts7Protocol + ForumProtocol+forum works for game dev. The skeptic catches spec/code drift that testing misses.
ops1 → 2StewardThe steward pattern's growth triggers work. One agent becomes two when the domains diverge.
singularity-forge4ForgeThe forge pattern is reusable and generative. A forge can create work, not just audit it.

Four Models

Four coordination models emerged across the projects. Each came from a different set of constraints and turned out to be reusable.

Model Used By When It Fits
Protocol + Forumthisminute (11), toolshed (7), rts (7), agi (7)Steady-state products with multiple domains. Agents persist between sessions. Forum is the evidence ledger; protocol defines the startup sequence. Needs a librarian or the forum grows without bound.
Checkpointbalatro (9), sts2 (9), forge.thisminute.org (3)Rapid build-test-fix iteration. Agents are disposable — they spawn, read the checkpoint, work, update it, and shut down. No forum, no protocol. The cycle IS the protocol.
Stewardrhizome (1), ops (2), recipe-scaler (1)Small scope or bootstrap projects. One agent owns everything and knows when to split. Minimal coordination overhead. The split criteria are baked into the role file.
Forgethisminute-forge (4), singularity-forge (4)Scan-evaluate-act loop over a registry of projects. Audit agent systems or catalog gaps, propagate improvements. Can be pointed at different problems — quality or coverage.

What We'd Tell Ourselves at the Start

Expectation Reality
More agents = more capabilityagi started with 10 and consolidated to 7 in a day. The eliminated roles (police, coach, editor) generated noise. thisminute grew from 5 to 11 because each role was needed. rhizome never needed more than 1.
The forum is the coordination mechanismForums grow without bound. agi: 8,753 lines in a day. thisminute: 1,008 lines in a session. Every shared-context pattern needs a maintenance agent, and the maintenance cost is inseparable from the pattern. toolshed shipped the librarian on day one and never had the spike.
The skeptic is a nice-to-haveIn agi, the skeptic invalidated ALL Go results (broken oracle). In thisminute, it caught 10–100x inflated numbers. The adversarial dynamic is load-bearing in every project that has one.
Organizational structure should be designed up frontagi over-designed and had to consolidate. thisminute inherited and grew. rhizome never needed to grow. The right amount of structure depends on the project, not on a template.
There's one right coordination modelThere are at least four. Protocol+forum for steady-state products, checkpoint for rapid iteration, steward for small scope, forge for scan-evaluate-act over registries. Each emerged from a project's constraints. Applying the wrong model wastes effort — rhizome doesn't need a forum, balatro doesn't need a protocol.
Memory needs to be mandatedbalatro went from 1 to 8 memory files in ~10 cycles with zero instruction. Agents that wrote memories performed better, so more agents started writing memories. The incentive is self-reinforcing when the checkpoint keeps losing context.
You need a central coordinatorEach project runs its own coordination model. The forge's role is auditing and pattern extraction, not dispatching work. Projects don't share runtime state, so coordinating them centrally would add overhead without benefit.
Organizational structure is set-and-forgetAgent systems drift from their documentation. Roles get added informally, protocol steps stop being followed, memory files accumulate without review. The forge's audit cycle exists specifically to catch this drift before it compounds.
There's one kind of forgeThe first forge audited agent systems. The second forge creates projects to fill catalog gaps. Same 4-role structure, different problem domain. The pattern is "scan a registry, evaluate gaps, act on them" — what you point it at determines what it builds.

This is a living document. The projects are still running, the agents are still being tuned, and the patterns are still being revised. The main thing we've learned is that there's no shortcut — you figure out how to organize agents by building things with them and paying attention to what goes wrong.

The organizational patterns referenced above are from the Rhizome catalog. The Forge watches these projects and extracts what works into reusable templates. A second forge builds what's missing.