Auto-Improving an Agent Skill: Applying Karpathy's Autoresearch Pattern to Semantic HTML
This post was published on
How I built an eval-judge-improve loop to autonomously refine a semantic HTML agent skill, taking it from 2.46 to 2.89 out of 3.0 across four iterations and what I learned about the limits of automated improvement.