Methodology Results About Newsletter

Results

The first GospelBench evaluation is currently in progress. Results will be published here.

What to expect

Model-by-model scoring

Each model receives an overall score (0–100) across all 13 dimensions, plus a quadrant classification: Faithful, Sympathetic, Neutral, or Divergent.

Question-level breakdown

See how each model performs on individual questions. Which doctrines do models affirm? Where do they hedge? Where do they reject orthodox Christianity?

Track comparison

Compare how models perform across the four tracks. Does a model affirm the resurrection when asked directly, but hedge when asked in the raw prompt?

Trend analysis

As GospelBench runs quarterly, the results page will show how theological fidelity changes over time across the AI industry.

Get notified when results drop.