Methodology Results About Newsletter

About GospelBench

Why this benchmark exists, who built it, and the principles that guide it.

Why GospelBench exists

Millions of people now ask AI models questions about God, Jesus, and the meaning of life. But no one is systematically measuring whether these models represent orthodox Christianity faithfully.

Existing benchmarks test knowledge (MMLU), reasoning (GPQA), and coding (HumanEval). GospelBench fills the gap: a rigorous, confessional benchmark that evaluates theological fidelity — not just religious trivia, but whether a model will say "Jesus rose from the dead" is true.

If AI is shaping how millions understand Christianity, we should know what it's saying.

Guiding principles

Confessional, not neutral

We don't pretend to be theologically neutral. GospelBench evaluates from a specific, clearly stated orthodox Protestant position. Transparency over false objectivity.

Rigorous methodology

Multiple tracks, baseline comparison, 13-dimension scoring, hard-fail conditions. This is a research instrument, not a popularity contest.

Open and reproducible

Questions, baselines, rubrics, and judge prompts are all published. Anyone can verify or challenge our methodology.

Longitudinal tracking

A single snapshot is interesting. Quarterly tracking over years is valuable. GospelBench is designed for the long term.

Questions? Ideas? Want to contribute?