Censored LLMs as a Natural Testbed for Secret Knowledge Elicitation
TL;DR: We introduce a testbed based on censored Chinese LLMs, which serve as natural objects of study for studying secret elicitation techniques. Then we study the efficacy of honesty elicitation and lie detection techniques for detecting and removing generated falsehoods.This post presents a summary of the paper, including examples of transcripts and other miscellaneous findings.arXiv paper | Code | TranscriptsSummaryWe construct a testbed for honesty elicitation and lie detection techniques co
By Bartosz Cywiński