Lecture 08 — Adversarial Refinement
L01 > L02 > L03 > L04 > L05 > L06 | L07 > [ L08 ] L09 > L10 > L11 > L12
"When there's no metric, the blind judge panel is the metric." — Blind judging removes authority bias and anchoring. Pure content wins.
Core idea: How
/autoresearch:reasonextends the loop to subjective decisions — adversarial critique, cold-start agents, and a blind judge panel as the fitness function.
Code examples: code/
Practice project: Project 04 — Architecture Decision Debate
The Problem
The autoresearch loop works when you have a mechanical metric. But architecture choices, product strategy, content quality, and design trade-offs are irreducibly subjective — there's no evaluate.py to run.
Without a mechanism, three failure modes appear:
Premature convergence: The first good-sounding argument wins. No one challenges it rigorously because challenging feels difficult.
Authority bias: The most senior person's opinion wins. Technical merit gets crowded out by hierarchy.
False consensus: Everyone says they agree, but the underlying tension resurfaces in implementation.
The Solution
Author-A generates candidate A
|
Critic attacks A (cold start — hasn't seen the generation)
|
Author-B responds (cold start — sees A + critique only)
generates improved candidate B
|
Synthesizer merges strongest elements → candidate C
|
Blind judge panel (3-7 judges, odd number preferred):
"Which is better, X or Y?" ← no labels, no history
|
C wins? → C becomes new A → loop
A wins? → A becomes new A → loop
|
N consecutive wins = convergence → write handoff.jsonKey invariant: Every agent is a fresh cold-start invocation. No shared session. No agent sees the history of the debate. This prevents coherence bias — agents changing their assessment to match what they said earlier.
How It Works
1. Cold-start agents prevent herding.
If agents read each other's outputs before forming their own, they herd toward a consensus that reflects the first agent's framing. Cold start means each agent forms an independent position before any synthesis happens.
2. Blind judging removes authority bias.
Sighted judging is unreliable. When judges know which option is "the new proposal" vs "the current approach," they bring in loss aversion and social pressure. Blind judges see only X and Y. If Y wins 3 times in a row across different judges, it's genuinely better by the only measure that matters: human judgment.
3. Convergence stops the loop.
/autoresearch:reason --convergence 3 # default: 3 consecutive winsAt convergence, the loop writes handoff.json:
{
"winning_candidate": "...",
"convergence_round": 7,
"judge_agreement": 0.85,
"key_arguments": [...],
"minority_dissent": "...",
"recommended_next_step": "autoresearch:plan"
}4. Where to use it.
| Domain | Example task |
|---|---|
software | Event sourcing vs. CQRS for order management |
product | Freemium vs. trial vs. open core pricing |
business | Build vs. buy decision for auth system |
security | Defense-in-depth strategy for API |
research | Hypothesis A vs. B — which to run first |
content | Landing page copy variant A vs. B |
5. Chain patterns.
reason → predict Converge on position → stress-test with 5-expert analysis
reason → plan Converge → scaffold autoresearch project to validate empirically
reason → scenario Converge → explore edge cases of the winning decisionThe most powerful chain for architecture decisions: reason → plan → autoresearch — converge on the decision, then validate it empirically with a research loop.
What Changed
| Subjective decision without structure | Adversarial refinement |
|---|---|
| First good argument wins | Critic attacks every candidate before it advances |
| Authority bias decides | Blind judges see only content, not labels |
| Consensus is false | Convergence requires N consecutive wins, not one vote |
| Decision is not documented | handoff.json captures winning arguments + dissent |
Try It
Run the debate tracker and judge aggregator:
cd docs/en/lectures/lecture-08-adversarial-refinement/code
python debate_tracker.py
python judge_aggregator.pyQuestions to think about:
- After running
judge_aggregator.py, what was thejudge_agreementscore? What does a score below 0.6 mean? - In the loop, the critic's job is to attack the candidate "as hard as possible." Why is a weak critic worse than no critic?
- Why does the loop use a blind judge panel instead of having the same agents vote?
- Think of a technical decision you've made recently — write three rounds of the adversarial loop: what would Critic say about your original choice?