Research

Best-of-Tails: Bridging Optimism and Pessimism in Inference-Time Alignment

Zac Boring March 10, 2026 1 min read

Inference-time alignment effectively steers large language models (LLMs) by generating multiple candidates from a reference model and selecting among them with an imperfect reward model. However, current strategies face a fundamental dilemma: ``optimistic'' approaches like Best-of-$N$ suffer from reward hacking, while ``pessimistic'' regularized methods often stifle the exploration needed to discover high-quality responses. In this work, we formali

By Hsiang Hsu, Eric Lei, Chun-Fu Chen

Read the full article at ArXiv cs.AI →