Tracking AI existential risk. Auto-aggregated headlines. Human-curated analysis.
AGGREGATING 47 SOURCES · UPDATED LIVE
Research

How will we do SFT on models with opaque reasoning?

Zac Boring February 21, 2026 1 min read
Read original source →

Published on February 21, 2026 12:00 AM GMTCurrent LLMs externalize lots of their reasoning in human interpretable language. This reasoning is sometimes unfaithful, sometimes strange and concerning, and LLMs can do somewhat impressive reasoning without using CoT, but my overall impression is that CoT currently is a reasonably complete and accurate representation of LLM reasoning. However, reasoning in interpretable language might turn out to be uncompetitive—if so, it seems probable that opaque

Read the full article at Alignment Forum →