October 1, 2025•Enterprise•10 min readEvaluating Agents on Financial Analyst Workflows (SheetBench)A case study on developing evaluations for agent performance on finance analyst jobs.The HUD Team, Sepal AIRead more
January 24, 2025•Research•8 min readHUD Autonomy: How do we evaluate and improve AI agents?At HUD, our mission is to help align human and AI agents' behavior. Today, we're excited to introduce Autonomy, our comprehensive evaluation framework for AI agents.Lorenss Martinsons, The HUD TeamRead more