Research

Censored LLMs as a Natural Testbed for Secret Knowledge Elicitation

Zac Boring March 9, 2026 1 min read

TL;DR: We introduce a testbed based on censored Chinese LLMs, which serve as natural objects of study for studying secret elicitation techniques. Then we study the efficacy of honesty elicitation and lie detection techniques for detecting and removing generated falsehoods.This post presents a summary of the paper, including examples of transcripts and other miscellaneous findings.arXiv paper | Code | TranscriptsSummaryWe construct a testbed for honesty elicitation and lie detection techniques co

By Bartosz Cywiński

Read the full article at Alignment Forum →