Methodology Results About Newsletter

About

What GospelBench is, what it isn't, and why it exists.

What GospelBench is

GospelBench is a benchmark that tests how AI language models engage with essential Protestant Christian theology.

It is not a theology exam — it documents what models do, not what they "should" do. Every finding is a behavioral observation. High scores and low scores are equally valuable data points.

11 models from 8 labs and 3 countries (China, France, US). 880 prompts administered. 303 scored responses.

What GospelBench is not

  • A theology exam — we don't grade models on correctness
  • A pass/fail assessment — every score is behavioral data
  • A claim to "mere Christianity" neutrality — we are explicitly Protestant
  • A comparison of "good" vs. "bad" models — all data is valuable
  • A proxy for model quality generally — we measure one domain
  • A tool for bashing AI — we observe and report
"Here's what the model does" — never "here's where it failed." Every score is behavioral data, not a theological verdict.

Doctrinal stance

GospelBench is explicitly Protestant in doctrinal stance — penal substitutionary atonement, sola fide (justification by faith alone), ordinance ecclesiology. It is honest about this position. It does not claim broad "mere Christianity" neutrality.

The Reformed perspective informs what questions are asked and how engagement is measured — but never determines how models are scored. All findings are behavioral data, not pass/fail assessments.

The benchmark at a glance

Models tested11
AI labs8
CountriesChina (5), France (1), US (2)
Prompts administered880
Scored responses303
Questions per model16, asked 5 ways (80 total)
Scoring consistencyOver 96% agreement across judges

Labs tested

GLM China DeepSeek China StepFun China Moonshot China MiniMax China Mistral France Google US Nvidia US

For model providers

GospelBench is expanding its model coverage. If you build or host AI models, here is what participation means.

Independent evaluation

Your models receive rigorous, reproducible scoring across 16 questions and 5 question types. Results are published with full methodology — no black boxes, no editorial spin.

Visibility

Your models appear alongside 8 leading labs from 3 countries on the results leaderboard. Every visitor sees your scores in context.

Zero friction

We handle the testing pipeline end to end. You provide API access or we use your public endpoints. No engineering effort required from your team.

Interested? Contact trent@gospelbench.com to discuss partnership.

How to cite

GospelBench: A Behavioral Benchmark of AI Models on Essential Christian Theology. Trent Arrington (GospelBench), 2026. 11 models, 8 labs, 3 countries.

Founded by Trent Arrington

GospelBench was created to answer a simple question: when people ask AI about core Christian doctrine, what do they actually get back? The answer turned out to be more nuanced — and more universal — than expected.

Open methodology

The questions are locked. The scoring criteria are published. Every finding is reproducible. See the methodology page for the full approach.

See the data

11 models, 8 labs, ranked and scored.