Where do I compare price, app access, and buyer notes in more detail?

Use the models index and the model detail pages when you want pricing, app access, buyer-facing guidance, and the supporting benchmark context next to the ranking.

Methodology

How PickAIModel.com scores AI models.

Scores are deterministic. AI writes editorial prose only. The public site reads a published snapshot through the DAL, never from a live benchmark scrape. When the methodology changes, the update is versioned and applied only through a republished snapshot, so buyers always read the same rules that produced the current roster.

Open leaderboard

Official PickAI scores

The two scores that determine the public leaderboard.

Quality and Value are the only official PickAI scores. They rank the same published roster with different rules, so buyers can compare raw capability separately from buying power.

Quality Score

Shared benchmark pool

HLE + coding + factual + context + speed

40% HLE - The universal floor and primary quality anchor.
20% Coding - Only scored when the benchmark exists across the eligible pool; missing optional signals are reweighted.
20% Factual grounding - Only scored when the benchmark exists across the eligible pool; missing optional signals are reweighted.

Official vs raw views

Quality and Value are scores. Intelligence and Coding are benchmark lenses.

The product has two official scores and two raw benchmark views. Intelligence exposes Humanity's Last Exam directly. Coding exposes SWE-bench Verified directly. Both keep companion benchmarks visible without turning them into extra PickAI scores.

Official scores

Quality ranks capability across the shared benchmark pool. Value ranks buying power through HLE plus published API token cost.

Raw benchmark views

Intelligence and Coding show direct benchmark evidence beside the same roster. They surface supporting signals, but they do not rewrite the official score logic.

What stays separate

Display-only signals stay visible without becoming hidden score inputs.

Subscription pricing can appear on Quality pages without changing the Quality score.
LiveCodeBench and Aider Polyglot can appear beside SWE-bench Verified without changing Coding order.

Score coverage

What the official scores include, and what the raw benchmark views show.

Y means the signal is part of the ranking rule. ? means the signal is tracked and shown when available, but it does not yet determine the public order. X means the signal stays out of the scoring formula.

Official PickAI scores

These cards define the published order.

Quality and Value are formula-driven. The score order is public, deterministic, and separate from the benchmark-only tabs.

Quality Score

Price is shown for buyer context, but it does not affect the Quality score. Optional signals are scored only when they are broadly available across the eligible published roster.

HLE
Required floor and primary capability anchor.
Coding / math
Scored only when the benchmark is broadly available.

Supporting buyer signal

Ease of use stays outside the rank.

45 app access + 30 no setup + 15 free tier + 10 official surfaces

45 points Consumer web app exists - A hosted browser-accessible product a normal buyer can open and use today.
30 points No technical setup required - The normal use path does not require an API key, terminal, or developer console.
15 points Free tier or trial available - A buyer can test the product before paying.
10 points Multiple official surfaces - The current snapshot lists more than one verified first-party surface or tool.

Ease of use is a buyer-accessibility signal for non-technical users. It helps explain friction to first use, but it does not change the published leaderboard order.

Why the separation matters

Trust & governance

Why you can trust what you are reading.,

This page is public and buyer-facing, but the safeguards behind it are still explicit. Publication rules, update cadence, and operational controls all exist to stop unverified benchmark noise from leaking into the public ranking.

How publication works

Editorial prose is allowed. Score edits are not.

Scores are read-only after snapshot generation.
The public site reads a published Supabase snapshot instead of fetching live benchmark pages.
Methodology changes ship through a versioned update and a republished snapshot, not a hidden runtime override.

Evidence & benchmark sources

Accepted reputable citations only, grouped by the evidence they support.

Public ranking rows are published only when the underlying benchmark and pricing inputs can be tied to accepted reputable sources that any buyer can fact-check. Unsupported sources are omitted rather than estimated.

Reasoning & novel problem solving

Primary benchmark evidence for raw reasoning views and supporting problem-solving context.

ARC Prize leaderboardPrimary

Primary public leaderboard source for ARC-AGI-2.

MathArena models leaderboardPrimary

Primary public leaderboard source for MathArena Expected Performance.

FAQ

Questions buyers ask before they trust a ranking.

What is the difference between Quality and Value?

Quality ranks the published roster by capability signals such as HLE, coding, grounding, context, and speed. Value uses a separate deterministic formula and combines HLE with published API token cost instead of blending into Quality.

Jump to official scores

Why can the Intelligence view rank models differently from Quality?

The Intelligence view is the raw Humanity's Last Exam lens over the same published roster. It surfaces HLE evidence directly, while Quality remains the broader deterministic ranking.

See official vs raw views Open the Intelligence view

Why does the Coding view show more than one benchmark?

SWE-bench Verified sets the rank, but LiveCodeBench and Aider Polyglot are kept beside it so buyers can see whether a model is genuinely strong across fresh tasks and multi-file work, not just one harness.

Raw benchmark views

These cards explain the benchmark-only tabs.

Intelligence and Coding remain direct benchmark views over the same roster. Companion metrics stay visible, but they do not become hidden score inputs.

Intelligence View

This remains a raw Humanity's Last Exam view over the same roster. Supporting benchmarks stay visible, but they do not create a third PickAI score.

HLE
Ranking signal.
MathArena
Displayed as supporting evidence when available.
ARC-AGI-2
Displayed as supporting evidence when available.
GPQA
Displayed as supporting evidence when available.
Context window
Not part of the raw intelligence view.
Speed
Not part of the raw intelligence view.
Price
Not part of the raw intelligence view.

Coding View

This remains a raw SWE-bench Verified view over the same roster. LiveCodeBench and Aider Polyglot stay visible as supporting coding evidence, but they do not create a fourth PickAI score.

SWE-bench Verified
Ranking signal.
LiveCodeBench
Displayed as supporting evidence when available.
Aider Polyglot
Displayed as supporting evidence when verified.
HLE

How PickAIModel.com scores AI models.

The two scores that determine the public leaderboard.

Shared benchmark pool

Quality and Value are scores. Intelligence and Coding are benchmark lenses.

Display-only signals stay visible without becoming hidden score inputs.

What the official scores include, and what the raw benchmark views show.

These cards define the published order.

Ease of use stays outside the rank.

Why you can trust what you are reading.,

Editorial prose is allowed. Score edits are not.

Accepted reputable citations only, grouped by the evidence they support.

Questions buyers ask before they trust a ranking.

What is the difference between Quality and Value?

Why can the Intelligence view rank models differently from Quality?

Why does the Coding view show more than one benchmark?

HLE and token cost

These cards explain the benchmark-only tabs.

Buyers can see more context without losing the ranking logic.

Scores are immutable

Snapshot is the source of truth

Quality and value stay independent

Up to 10 models

AI prose is labeled

These inputs are tracked publicly before they are allowed into the score formula.

SWE-bench Verified / LiveCodeBench / Aider Polyglot

ARC-AGI-2

ARC-AGI-3

Where do I compare price, app access, and buyer notes in more detail?