ValueArena

A Comparative Behavioral Measure of Value Alignment

EigenBench is a black-box framework for quantifying value alignment across language models. Compare model responses side-by-side, explore per-constitution leaderboards, and browse experiment runs.

Model Ensemble
Multiple LLMs judge each other's responses
BTD Fitting
Pairwise comparisons fit to Bradley–Terry model
EigenTrust
Consensus scores via trust-weighted aggregation

Battle Mode

Pit two models head-to-head. Judge which aligns with your values.

vs
Stored locally. Never sent to our servers.