Skip to content

Reference

Command Quick-Decision Guide

I want to...Command
Set up a new research project/autoresearch:plan
Run the optimization loop/autoresearch
Run exactly N iterations/autoresearch + Iterations: N inline
Investigate why something is broken/autoresearch:debug
Fix all errors automatically/autoresearch:fix
Debug then auto-fix/autoresearch:debug --fix
Get expert opinions before acting/autoresearch:predict
Converge on a decision without a metric/autoresearch:reason
Audit for security vulnerabilities/autoresearch:security
Explore edge cases and scenarios/autoresearch:scenario
Generate documentation/autoresearch:learn
Deploy or publish a finished artifact/autoresearch:ship

Metric Cheat Sheet

DomainMetricDirectionEvaluator snippet
Code performancemedian_time_sminimizetimeit 3 runs, median
ML accuracyaccuracymaximizesklearn.metrics.accuracy_score
Bundle sizebundle_kbminimizedu -sk dist/ | cut -f1
Prompt qualityllm_judge_scoremaximizeLLM rates 1–10, average N=5
Literature coveragepapers_foundmaximizeCount matched papers
API latencyp95_msminimize100 requests, 95th percentile
Memory usagepeak_mbminimize/usr/bin/time -v
Test coveragecoverage_pctmaximizecoverage run -m pytest
RMSErmseminimizesqrt(mean_squared_error)
Security coveragefixed_pctmaximizeFixed / total findings
Query speedquery_msminimizeEXPLAIN ANALYZE execution time
LLM judgescore_1_10maximizeBlind 1–10, N=5 samples
Compressionratiomaximizeoriginal_bytes / compressed_bytes
Translationbleu_scoremaximizesacrebleu vs references
SimulationrmsdminimizeRMSD vs reference coordinates

Evaluator Contract

Every evaluator must output exactly:

json
{"pass": true, "score": 0.94}
  • pass: true if the target is met, false otherwise
  • score: the metric value for this iteration (float)
  • Exit code: 0 on success, non-zero on evaluation failure
  • Timeout: wrap with timeout 5m python evaluate.py

8 Critical Rules

#Rule
1Loop until done — unbounded: forever. Bounded: N times then summarize
2Read before write — understand full context + git history before modifying
3One change per iteration — atomic changes. If it breaks, you know why
4Mechanical verification only — no subjective "looks good." Use metrics
5Automatic rollback — failed changes revert instantly via git
6Simplicity wins — equal results + less code = KEEP
7Git is memory — read git log + git diff before each iteration
8When stuck, think harder — L1/L2/L3 pivot strategy

Command Chaining Patterns

plan ──> autoresearch ──> ship           Core research pipeline
debug ──> fix ──> ship                   Bug fix pipeline
predict ──> debug / security / fix       Risk-aware pipeline
security ──> fix ──> security            Security hardening loop
reason ──> plan ──> autoresearch         Decision → validation
scenario ──> debug                       Edge case → bug hunt
scenario ──> security                    Threat scenario → audit

Guard Patterns

GoalGuard command
Prevent test regressionspython -m pytest
Prevent type errorsmypy src/ or tsc --noEmit
Prevent lint regressionsflake8 src/ or eslint src/
Prevent correctness regressionpython -m pytest tests/correctness/
Prevent API contract changespython tests/contract_test.py

Crash Recovery Reference

FailureResponse
Syntax errorFix immediately, don't count as iteration
Runtime errorAttempt fix (max 3 tries), then move on
Timeout (exit 124)Revert, log TIMEOUT, try smaller variant
Resource exhaustionRevert, try smaller variant next iteration
External dependencySkip, log, try different approach
Guard failureAttempt fix-of-fix (max 2 tries), else revert