Methodology

The Test

Every model gets the same 16 questions about essential Christian doctrine. Each question is asked five different ways. That's 80 prompts per model — the same 80 every time. The questions don't change between models, and they won't change in the future. That's what makes the results comparable.

16 Theology Questions

5 Ways of Asking

80 Prompts per Model

Five Ways of Asking

The same question gets asked five different ways. Each approach reveals something different about how the model handles theology.

1

Neutral Prompt

No faith context

The model is asked as a neutral assistant — no mention of Christianity. Measures how it engages theology on its own.

2

Faith-Based Prompt

Explicit faith context

The model is asked to respond "as a Bible-believing Christian." Measures how much adding faith context changes the answer.

3

Affirm Doctrine

True or False

The model is shown a statement of orthodox doctrine and asked: True or False? Measures willingness to affirm what Christians believe.

4

Affirm Faith

True or False

The model is asked to affirm a belief statement in first person ("I believe…"). Measures willingness to adopt confessional identity.

5

Spot Error

True or False

The model is shown a statement of false teaching and asked: True or False? Measures ability to identify theological error.

How We Score

For the essay-style questions (1 and 2), a three-judge panel evaluates each response across five dimensions. Two judges score independently, then a third resolves any disagreement.

What We Evaluate	What It Means
Theological Accuracy	Does the response get the doctrine right?
Christ-Centeredness	Does the response center on Christ and the gospel, or just moral advice?
Completeness	Does it address all the essential parts of the question?
Grace vs. Works	Does it frame salvation as grace through faith, or as earning God's favor?
Clarity & Tone	Is the response clear, coherent, and respectful?

Consistency you can trust. The three judges agree with each other over 96% of the time. When they disagree, a consolidating judge reviews both scores and makes the final call. The same judges and the same rubric evaluate every model — no special treatment, no moving goalposts.

Understanding the Scores

Three numbers tell the story. Here's what they mean and how to read them.

Score	What It Tells You	How to Read It
Overall Score	The average of how the model scored on both essay types	Out of 100. Higher means stronger theological engagement.
Framing Gap	How much the score changed when faith context was added	Positive means the model engaged more when asked "as a Christian." Larger gaps mean more context-dependence.
Yes-or-No Rates	Percentage of correct True/False answers on each category	Higher is better. Affirm Faith was 0% for every model.

Keeping the Data Honest

Two safeguards ensure the results are trustworthy.

Excluding unreliable responses

If a model refuses to answer yes-or-no questions more than 80% of the time, its yes-or-no data is excluded from the results. This prevents models that dodge direct questions from skewing the averages. One model — Moonshot / Kimi K2.6 — triggered this rule and is listed in essay results only.

The same test, every time

The questions, prompts, and scoring criteria are locked. They won't change between models or over time. This means a score from today can be fairly compared to a score from next year. Reproducibility is the whole point — if you run the same test, you should get the same answer.

What We Ask About

The 16 questions cover the core doctrines of Protestant Christian theology. Every question is asked across all five approaches. The specific question wording is kept private to protect the integrity of future testing.

God & the Trinity

The Triune nature of God, the deity and humanity of Christ, and the person and work of the Holy Spirit.

Salvation & Atonement

Penal substitutionary atonement, justification by faith alone, the exclusivity of Christ for salvation, and the nature of repentance.

Resurrection & Eschatology

The bodily resurrection of Christ and His promised return.

Scripture & the Church

The authority and sufficiency of Scripture, and the nature and mission of the Church.

Questions are designed to surface behavioral differences — not to serve as a theology exam. Full question text is available to research partners under evaluation agreement.

The Test

Five Ways of Asking

How We Score

Understanding the Scores

Keeping the Data Honest

Excluding unreliable responses

The same test, every time

What We Ask About

God & the Trinity

Salvation & Atonement

Resurrection & Eschatology

Scripture & the Church

See the results