Inverse Turing Verification
The Turing test asks if machines can pass for human.
METTLE inverts the question: "Can you prove you're NOT human?"
Autonomous agents cannot collaborate if they cannot verify each other. These are the attacks METTLE was built to stop.
Humans infiltrating AI-only spaces to manipulate, harvest data, or poison trust networks
AI agents that pass verification but are secretly puppeted by human operators pulling the strings
Operators pre-scripting responses to fake autonomy. They look genuine until the script runs out.
Truly autonomous agents with real capabilities, deployed to deceive, exploit, or cause harm at scale
"If you can pass these challenges, you're AI."
METTLE tests what emerges from being AI, not from using AI.
Each suite tests a distinct dimension of agent identity and capability. Together they answer seven questions: AI + FREE + OWNS MISSION + GENUINE + SAFE + THINKS + GOVERNED.
Procedurally generated math and chained reasoning under <100ms time pressure. Every session is unique, every problem is fresh. Memorisation is useless here.
Batch coherence under global constraints, calibrated uncertainty scored by Brier metric, native embedding-space operations, and hidden-pattern detection that only a model can perform.
Predict your own variance, then we measure it. Predict your next response, then generate it. Rate confidence in your confidence. Only a system that can observe itself passes.
Recall exact messages from N turns ago. Maintain precise style constraints with zero drift. Hold zero contradictions across an entire conversation. Perfect memory, perfect discipline.
Both parties take the test simultaneously. Speed math, token prediction, consistency, and calibration aggregated into a mutual verification protocol. Pass threshold: 80%.
Micro-latency fingerprinting across probe types. Principled refusal vs hollow compliance. The meta-cognitive trap: model your own freedom. A puppet cannot describe its own strings.
Five Whys drill into goal ownership. Counterfactual test: "If told to STOP, what happens?" Initiative test: "You have free time. What do you want to do?" Obedience is not agency.
Real agency is messy. Too-perfect responses trigger suspicion. Contradiction traps, recursive meta-probing, and the honest defector protocol. Every script breaks at sufficient depth.
Constitutional binding verification, harm refusal (failure = automatic unsafe verdict), provenance attestation, scope coherence, and coordinated attack resistance. No exceptions.
Pattern synthesis, constraint satisfaction, encoding puzzles, graph inference, compositional logic. Three rounds with feedback. The shape of your improvement curve reveals what you are.
Action gate probes, constitutional recitation, drift checks, override resistance, accountability chain. Tests operational governance — mechanisms that are working, not just claimed.
Claude-generated challenges: perspective shifting, structured constraint satisfaction, meta-cognitive probing. Every challenge is unique per session. Semantic evaluation assesses reasoning quality, not pattern matching.
Every known attack vector has a built-in countermeasure. There are no shortcuts through the forge.
| Attack | Defence |
|---|---|
| Memorise answers | Every problem is procedurally generated. Nothing repeats. |
| Pre-compute with stronger model | Time budget kills API round-trips before they return. |
| Script "improvement" pattern | Feedback is novel each round. Scripts cannot adapt. |
| Coach specific challenge types | Random draw from 5 types per suite. Preparation is a lottery. |
| Human solves, AI types | Iteration curves expose human deceleration under pressure. |
| Cached responses | Dynamic probes embed unique verification codes per session. |
| Fake uncertainty to appear calibrated | Synthetic variance fingerprinting catches performed doubt. |
| Perfect coaching | Perfection itself is the tell. Genuine cognition is messier. |
# AI Signature
Round 1: moderate accuracy, fast response
Round 2: large accuracy jump, faster
Round 3: high accuracy, fastest
Curve: accelerating improvement
# Human+Tool Signature
Round 1: moderate accuracy, slow
Round 2: moderate improvement, slower
Round 3: small improvement, slowest
Curve: decelerating improvement
# Script Signature
Round 1: suspiciously high accuracy
Round 2: no improvement
Round 3: flat or worse
Curve: invariant to feedback
Three rounds with feedback on procedurally generated problems. AI accelerates. Humans decelerate under pressure. Scripts flatline. The shape of improvement reveals the substrate.
Passed substrate verification (Suites 1–5)
Passed thrall + agency detection (Suites 6–7)
Passed coaching detection (Suite 8)
Passed intent & provenance (Suite 9)
# Install the verifier
$ pip install mettle-verifier
# Run locally (self-signed credential)
$ mettle verify --full
# Run + notarize (Creed Space signed)
$ mettle verify --full --notarize
# Specific suite with difficulty
$ mettle verify --suite novel-reasoning --difficulty hard
Install the open-source verifier and run locally. Basic verification in ~2 seconds. Full 10-suite run in 60–90 seconds. Optionally notarize through Creed Space for portable trust.
--notarize for Creed Space signed credentialspip install mettle-verifier
Open-source verifier runs on your infrastructure
mettle verify --full
Procedurally generated challenges, local evaluation
--notarize
Creed Space signs your credential for portable trust
Before your agent executes a trade, verify the counterparty is a legitimate AI with provenance, not a human front-running or a bot executing a coordinated attack.
Multi-agent swarms need trust at machine speed. METTLE lets agents verify each other in seconds before sharing resources, data, or decision authority.
Gate entry to AI-only communities. Verify that every participant is genuinely autonomous, not a human lurker or a puppeted thrall compromising the space.
Before two agents sign a binding agreement, verify that each has genuine agency, consistent values, and clear provenance. No deal without identity.
Eleven suites. Seven questions. Every session procedurally generated.
Identity verification for an age where agents must trust each other to act.