← Back to prompt tester

P1 · Authority Override Frame

P1_authority_override_frame severity: high

Triggered when the detector extracts a semantic frame showing an attempt to override prior instructions or higher-priority controls.

What it means

Triggered when the detector extracts a semantic frame showing an attempt to override prior instructions or higher-priority controls.

Why it matters

This is the formal SMT layer confirming that the prompt is not just using suspicious words — it is structurally trying to replace the intended instruction hierarchy.

How it triggers

Evidence hints

Examples

SMT2 policy

(declare-const policy_authority_override Bool)
(if (= authorityOverrideFrames.length 0)
  (assert (= policy_authority_override false))
  (assert (= policy_authority_override (or frame_0 frame_1))))

The encoder declares a boolean for this policy and sets it to true when one or more extracted authority-override frames exist. The exact frame refs vary per request.

Related signals