Projects

Research

Evaluation Cards

This project addresses the need for a structured and systematic approach to documenting AI model evaluations through the creation of "evaluation cards," focusing specifically on technical base syst...

Chairs: Avijit Ghosh, Anka Reuel

Learn more →

Infrastructure

Every Eval Ever

Every Eval Ever is a standardized schema for AI evaluation results, promoting interoperability, reproducibility, and transparency across the ecosystem.

Chairs: Jan Batzner, Leshem Choshen, Avijit Ghosh

Learn more →

Research

Benchmark Saturation

This project aims to investigate how to systematically characterize the complexity and behavior of AI benchmarks over time, with the overarching goal of informing more robust benchmark design. The ...

Chairs: Anka Reuel, Mubashara Akhtar

Learn more →

Infrastructure

Evaluation Harness and Tutorials

The Eleuther Harness Tutorials project is designed to lower the barrier to entry for using the LM Evaluation Harness, making it easier for researchers and practitioners to onboard, evaluate, and co...

Chairs: Baber Abbasi, Stella Biderman

Learn more →

Research

Evaluation Science

Recognizing the current lack of robustness in GPAI risk evaluations and the resulting limitations for informed decision-making and societal preparedness, this project aims to establish a scientific...

Chairs: Subho Majumdar, Patricia Paskov

Learn more →

Organization

Outreach & Research Engagement

Building upon the momentum of NeurIPS 2024, we aim to cultivate a QueerInAI-esque presence at other relevant venues by organizing social events and short talks to broaden our reach and foster commu...

Chairs: Jennifer Mickel, Usman Gohar

Learn more →

Projects

Evaluation Cards

Every Eval Ever

Benchmark Saturation

Evaluation Harness and Tutorials

Evaluation Science

Outreach & Research Engagement

Join Our Community