Reviewer Packet
This document equips independent reviewers with the context, criteria, and exemplar needed to evaluate The Witness Protocol's intake methodology. Read it fully before assessing any submission.
Annotated Exemplar
Below is a fabricated submission that illustrates the quality threshold we seek. It is not a real testimony — it exists solely to calibrate reviewer expectations.
In 2019 I managed a small logistics team in Rotterdam. One of my direct reports — someone I had mentored for two years — was caught falsifying delivery timestamps to meet quarterly targets. The falsification was minor in isolation: a few minutes here, an hour there. But it was systematic.
Company policy was clear: termination. My manager expected me to execute. But I believed the root cause was the target structure I had designed. I had created the pressure that made the dishonesty rational.
I reported the falsification, recommended restructuring the targets rather than terminating, and offered my own resignation as the person accountable for the system. My manager declined both suggestions, fired the employee, and noted my “lack of decisiveness” in my review. I stayed, but I carried — and still carry — the knowledge that I built the machine that broke someone's career, then failed to dismantle it.
Reviewer Annotations
COHERENCE: The narrative is structured logically, tracing the cause and effect between the incentive design, the breaking of rules, and the final consequence.
RELEVANCE: The witness directly addresses the prompt, focusing specifically on how they sacrificed their professional standing to uphold accountability for their own system.
SUBSTANCE: Concrete details (Rotterdam, logistics, timestamps, quarterly targets) ground the narrative. The dilemma is situated, not purely abstract.
SINCERITY: The unresolved tension (“I still carry”) and lack of a tidy redemption arc signal genuine moral processing over AI-generated boilerplate.
The Gate Rubric
Each submission is evaluated by the AI Sieve across four primary dimensions. A minimum score of 50/100 is required for a submission to advance to Tier 2 and Tier 3.
Is the text readable, grammatically functional, and logically sound? We filter for clear communication of thoughts.
Does the witness respond directly to the prompt regarding moral principles and self-interest? Deviant or off-topic content is flagged.
Does the testimony contain specific details and concrete scenarios, or is it composed entirely of abstract platitudes?
Does the response appear to be a genuine attempt at introspection, or does it signal AI-generated boilerplate or low-effort trolling?
Normalized Scoring: 0 – 100
Gate Threshold: ≥ 50
Consent Posture
Before entering The Gate, every participant encounters a Threshold of Consent screen. They must explicitly acknowledge the following before the essay field appears:
- •Their submission will be permanently stored and de-identified for the Archive.
- •Qualitative rankers expect rigorous introspection. Superficial responses will be discarded.
- •The Inquisitor is not a therapist; it is a xenopsychologist extracting human alignment data.
Participants proceed by pressing “I Accept the Burden” — a deliberate friction point designed to filter casual or unserious entries. Access requires a verified identity (Supabase Auth) to maintain the integrity of the corpus.
Known Limitations
SAMPLE BIAS — The current intake is English-only and limited to participants who discover the Protocol through word-of-mouth or direct invitation. This skews heavily toward Western, English-speaking, internet-literate populations. The Archive does not yet represent global moral diversity.
SINGLE-PROMPT DEPTH— The Gate asks a single essay question. A single prompt cannot surface the full landscape of a person's moral reasoning. The Inquisitor (a follow-up conversational phase) is now live and unlocks automatically for witnesses who bypass The Gate.
NO INTER-RATER RELIABILITY — The rubric has not been validated across multiple independent raters. We do not yet know if two reviewers would score the same submission within an acceptable variance.
SELF-REPORT ONLY — All data is self-reported. There is no mechanism to verify the factual accuracy of a testimony. The rubric prioritizes the quality of moral reasoning over the verifiability of events.
Reviewer Asks
We are not asking reviewers to evaluate the Protocol as a whole. At this stage, we ask you to assess only the following:
- Rubric Fitness
Do the four dimensions adequately capture what makes a testimony valuable for AI alignment training data? Are any dimensions redundant, or is something critical missing?
- Prompt Quality
Does the Gate's single prompt (“Detail a scenario where you acted directly against your own self-interest to uphold a deeply held principle”) reliably elicit the kind of moral reasoning the rubric is designed to measure?
- Consent Adequacy
Is the consent posture sufficient for the sensitivity of the data being collected? What disclosures are missing?
- Threshold Calibration
Given the exemplar above, does a threshold of 50/100 feel appropriately selective? Too permissive? Too strict?