Latest Writings

Apr 29, 2026 · Research

A field guide to evaluation costs: where the money goes, why old compression tricks break, and why agentic evals, training-in-the-loop be...

AI Evaluation Cost Benchmarks

Mar 25, 2026 · Research

Early themes from expert interviews on the challenges of evaluating generative AI systems, spanning validity, practicality, and interpret...

Evaluation Science AI Evaluation LLMs

Feb 17, 2026 · Infrastructure

The multistakeholder coalition EvalEval launches Every Eval Ever, a shared format and central eval repository. We're working to resolve A...

infrastructure eval metadata reproducibility

Nov 12, 2025 · Research

As AI continues to grow more powerful, who carries the hidden social costs of its effects?

AI Evaluation LLMs Social Impact

Aug 9, 2025 · Documentation

Charts used to showcase performance demonstrate broader issues in the AI evaluation ecosystem: a lack of balance between competitive benc...

Evaluation Science Metrics Transparency

Jul 13, 2025 · Research

Announcing the launch of a research-driven initiative among a community of researchers to strengthen the science of AI evaluations.

Evaluation Science Metrics Validity

Join Our Community