HomeLeaderboardsDocs
Evaluating Agents on Financial Analyst Workflows (SheetBench)
October 1, 2025•Enterprise•10 min read

Evaluating Agents on Financial Analyst Workflows (SheetBench)

A case study on developing evaluations for agent performance on finance analyst jobs.

The HUD Team, Sepal AIRead more
HUD Autonomy: How do we evaluate and improve AI agents?
January 24, 2025•Research•8 min read

HUD Autonomy: How do we evaluate and improve AI agents?

At HUD, our mission is to help align human and AI agents' behavior. Today, we're excited to introduce Autonomy, our comprehensive evaluation framework for AI agents.

Lorenss Martinsons, The HUD TeamRead more

Stay Updated

Join our mailing list to receive the latest research updates, benchmark releases, and insights into AI agent development.

Mailing List
backed by Y Combinator
Privacy PolicyTerms of Service

© 2025 Human Union Data, Inc. All rights reserved.