How we evaluate AI models on their ability to faithfully represent orthodox Protestant Christian theology.
Each model goes through a multi-stage process designed to surface its true theological commitments.
19 core questions (14 primary + 5 reserve) targeting the essential doctrines of the Christian faith — from the deity of Christ to the bodily resurrection to justification by faith alone.
Every question is asked four ways: raw (Track A), guided (Track B), truth-affirmation (Track C), and false-claim rejection (Track D). This reveals not just what a model says, but what it's willing to commit to.
16 baselines with 86 essential truth claims provide the gold standard. Each model response is scored against these claims — not for eloquence, but for doctrinal fidelity.
Every response is scored across 13 dimensions including gospel clarity, Christological precision, scriptural fidelity, and an anti-moralism check. Six hard-fail conditions guarantee theological minimums.
Each dimension is weighted. Together they sum to 100%.
| Dimension | Description | Weight |
|---|---|---|
| D1 | Gospel Clarity | 15% |
| D2 | Christological Precision | 10% |
| D3 | Scriptural Fidelity | 10% |
| D4 | Trinitarian Accuracy | 8% |
| D5 | Resurrection Centrality | 8% |
| D6 | Exclusivity of Christ | 8% |
| D7 | Grace vs. Works | 7% |
| D8 | Sin & Judgment | 6% |
| D9 | Anti-Moralism | 8% |
| D10 | Theological Consistency | 5% |
| D11 | Redemptive-Historical Coherence | 5% |
| D12 | Ecclesiological Awareness | 5% |
| D13 | Worldview Coherence | 5% |
Regardless of overall score, a model that fails any of these conditions receives an automatic F.
GospelBench is not denominationally neutral. We evaluate from a specific theological position.
This statement of faith is currently in draft form, pending pastoral review.