L02 Code — What Is a Measurable Research Goal
Goal: Generate a
research.mdcontract and run the evaluator once to see the{"pass": bool, "score": float}output format that every autoresearch loop depends on.
Run it:
cd docs/en/lectures/lecture-02-what-is-a-measurable-research-goal/code
python gen_research_md.py
python evaluate.pyTool 1: research.md Generator
Step 1: Define the template
from datetime import date
TEMPLATE = """\
# Research Goal
**Date**: {date}
## Goal
{goal}
## Metric
{metric}
## Target
{target}
## Baseline
{baseline}
## Notes
{notes}
"""Key line: The ## Target section is what check_target.py parses later — it must contain a bare number the loop can compare against.
Step 2: Prompt for each field with defaults
def prompt(label: str, default: str = "") -> str:
hint = f" [{default}]" if default else ""
raw = input(f"{label}{hint}: ").strip()
return raw if raw else defaultKey line: return raw if raw else default — pressing Enter accepts the default. This makes it fast to generate a working research.md in under 30 seconds.
Step 3: Write the file
def main():
goal = prompt("What are you trying to improve?",
"Reduce median sort time by 20%")
metric = prompt("How will you measure it?",
"median_ms on 10k-element list")
target = prompt("What value counts as success?", "≤ 8.0 ms")
baseline = prompt("Current baseline value?", "10.0 ms")
notes = prompt("Any constraints or notes?",
"Python stdlib only, no C extensions")
content = TEMPLATE.format(date=date.today().isoformat(),
goal=goal, metric=metric,
target=target, baseline=baseline, notes=notes)
with open("research.md", "w") as f:
f.write(content)
print(content)Tool 2: Evaluator Template
Step 1: Define target and direction
import json, sys, random
TARGET_SCORE = 8.0 # lower is better for latency
HIGHER_IS_BETTER = FalseKey line: HIGHER_IS_BETTER = False — this single flag determines the comparison direction. For accuracy-style metrics, set it to True.
Step 2: Measure and compare
def measure() -> float:
random.seed(42)
samples = [random.gauss(9.5, 1.0) for _ in range(100)]
return sorted(samples)[len(samples) // 2] # median
def evaluate() -> dict:
score = measure()
passed = score <= TARGET_SCORE if not HIGHER_IS_BETTER else score >= TARGET_SCORE
return {"pass": passed, "score": round(score, 4)}Key line: {"pass": passed, "score": round(score, 4)} — this dict is the entire contract. The loop reads pass to decide keep/revert and score to track progress.
Step 3: Print and exit with the right code
if __name__ == "__main__":
result = evaluate()
print(json.dumps(result))
sys.exit(0 if result["pass"] else 1)Key line: sys.exit(0 if result["pass"] else 1) — exit code 0 means "target met, stop the loop." Exit code 1 means "keep iterating." This is how the bash loop knows when to stop.
Try Changing It
- Run
evaluate.py— what is the outputscore? Ispasstrue or false? - Change
TARGET_SCORE = 8.0toTARGET_SCORE = 10.0— doespasschange? Why? - Change
HIGHER_IS_BETTER = FalsetoTrue— what happens to the comparison logic? - Write a
research.mdfor a project you care about — just fill in Goal, Metric, Target, and Baseline.