When did SS1/23 become effective?

17 May 2024. PRA Supervisory Statement SS1/23 sets out model risk management principles for UK PRA-regulated banks and PRA-designated insurers. It explicitly covers AI and machine learning models.

Who in my firm is accountable under SS1/23?

The Chief Risk Officer (SMF-7) is the principal accountable executive. The framework also names the model owner, the validator (Second Line), and the model user as distinct roles. For an AI agent, all four roles need to be filled with named individuals.

Cornerstone

Is your LLM agent a 'model' under PRA SS1/23? The five tests that decide it

PRA SS1/23's Principle 1 defines a model in broad terms. Five concrete tests applied to a typical LLM-powered agent — the answer is yes in every case. What that means operationally.

By Dipankar Sarkar June 1, 2026 5 min read

pra-ss1-23model-riskmrmuk-bankingss1-23

The most consequential question for AI agent builders in UK PRA-regulated firms is: is my LLM-powered agent a “model” under PRA SS1/23? The answer is yes. Walking through the five tests makes the conclusion unavoidable. The Principle 1 definition reads:

“A quantitative method, system, or approach that applies statistical, economic, financial, or mathematical theories, techniques and assumptions to process input data into output.”

Let’s run an LLM agent through the test sieve.

Test 1: Does it process input data into output? #

LLM agents read prompts (input data) and produce decisions or recommendations (output). For a credit decision agent:

Input: customer financial data + product context + conversation history.
Output: a credit recommendation, a risk assessment, a tool invocation to book the credit limit increase.

Clearly yes.

Test 2: Does it use statistical / economic / financial / mathematical theory? #

Transformer architectures are statistical. The model’s prediction is a probability distribution over tokens, sampled via various techniques. The training process is stochastic gradient descent on a quantified loss function. The fine-tuning, the RLHF, the alignment work — all quantitative.

Even if you outsource the LLM (you’re not training the underlying weights), the model you call is built on quantitative methods.

Yes.

Test 3: Are the techniques used internally? #

The PRA’s definition covers internally-used methods. The LLM provider (OpenAI, Anthropic, Google) supplies the weights, but the deployment context — the prompts, the tool definitions, the fine-tuning, the system instructions, the wrapping agent code — is internal. SS1/23 doesn’t carve out “we just call a third-party API”; the agent as deployed is internal.

Yes.

Test 4: Does it affect a financial decision the bank takes? #

This is the test the PRA’s Principle 1 specifically targets. For agents in production:

A credit-decision agent: yes, makes (or substantively shapes) credit decisions.
A fraud-detection agent: yes, makes fraud decisions.
A KYC-verification agent: yes, decides whether to onboard a customer.
A customer-service agent that resolves complaints with discounts / refunds: yes, takes operational financial decisions.
A treasury-operations agent: yes, takes treasury decisions.

Even agents that don’t appear to “decide” — an interview-prep agent that summarises CVs and ranks them — affect financial decisions downstream (the bank’s hiring decisions, the eventual employment compensation).

Yes in essentially every production case.

Test 5: Is the output relied upon by the bank? #

If the agent’s output is shipped to a customer, used by a banker, or flows into a downstream system, the bank is relying on it. The “this is just advisory” carve-out doesn’t hold up when the output has material effect.

Yes.

All five yes. What now? #

The five tests are conjunctive — meeting the definition is enough. Once you accept that your LLM agent is a model under SS1/23, the five-principle framework applies:

Principle 1 — Model Identification #

Every LLM, every fine-tuned variant, every agent persona that uses a distinct prompt template or tool set, is a separate model. The model inventory (Regulus Model Registry) is the runtime view.

For a credit-decision agent, you might have:

Base model: gemini-2.5-pro (provider: vertex-ai, tier: 2)
Customer-facing persona: cd-agent-v3 (a fine-tuned variant)
Sub-agent for fraud check: fraud-check-v2

Three entries in the Registry, each with an ID, tier, validation evidence pointer, approving SMF Principal, review-due date.

Principle 2 — Governance #

The Chief Risk Officer (SMF-7) is the accountable executive. The model-risk policy must be approved by the board and reviewed annually. The model risk function (Second Line of Defence) operates independently of the model development and use functions.

At runtime, this maps to:

SMF-7’s Principal ID on the model registry entry.
Second-Line Principal IDs on validation events.
Model user Principal IDs on each invocation.

Principle 3 — Model Development, Implementation and Use #

Development is largely upstream of Regulus (it’s the LLM provider’s work plus your fine-tuning pipeline). Implementation and use are runtime:

Implementation: the deployment event in the audit chain captures who deployed the model, when, and with what configuration.
Use: every model invocation is an audit event, tagged with the invoking Principal, the model ID, the tier.

Principle 4 — Independent Validation #

Second-Line validators must produce independent validation reports. Regulus captures:

The validation report URL on the model registry entry.
The validator’s Principal ID on the validation event.
The date the validation was performed.
The validity window (typically annual).

Tier-2 and tier-3 models require validation before production use; tier-3 also requires periodic re-validation with the operational data.

Principle 5 — Ongoing Monitoring #

The audit chain is the monitoring substrate. Outcomes are tagged per event. Drift detection runs against the historical chain. Periodic re-approval cycles use the audit data as input.

Key metric types to track:

Decision distribution (ALLOW vs DENY vs REQUIRE_HITL over time).
Outcome quality (where outcomes are observable — e.g. credit default rates after 90 days).
Demographic distribution (fairness deltas across protected characteristics).
Validation-due dates (review-due timestamps that pass without re-approval).

What an MRM walkthrough actually asks #

A real Second-Line walkthrough on an LLM credit decision agent:

“Show me your model inventory.” Export from the Registry.
“Show me the validation report for cd-agent-v3.” Dereference the validation-evidence URL.
“Show me last quarter’s outcomes for cd-agent-v3.” Audit chain filtered by model_id + tag = OUTCOMES.
“What’s your model retirement process?” The kill-switch playbook plus the Registry’s retirement workflow.
“What happens if gemini-2.5-pro degrades or your provider changes terms?” Concentration risk + exit plan. (This is also SS2/21 territory — outsourcing.)
“Who’s the SMF responsible?” SMF-7 Principal ID on the Registry entry; SMF-7’s name in the firm’s directory.

Each question is a query against the audit chain or the Registry. None requires manual evidence assembly.

What this isn’t #

Two things SS1/23 doesn’t ask for, that LLM-agent builders sometimes assume it does:

Full mathematical proof of the model. SS1/23 doesn’t require you to prove the LLM works mathematically — that’s an interpretability research problem, not a regulatory one. What SS1/23 requires is evidence the model is appropriate for its purpose and is monitored on an ongoing basis.
Disclosure of model weights. Your LLM provider doesn’t have to open-source their weights for you to comply with SS1/23. The validation evidence is what you can observe of the model’s behaviour — its predictions on your test set, its outcomes in production, its handling of edge cases.

For the full operational view, see the SS1/23 profile page and the model-risk plugin page.