Best-of-Tails: Bridging Optimism and Pessimism in Inference-Time Alignment
Inference-time alignment effectively steers large language models (LLMs) by generating multiple candidates from a reference model and selecting among them with an imperfect reward model. However, current strategies face a fundamental dilemma: ``optimistic'' approaches like Best-of-$N$ suffer from reward hacking, while ``pessimistic'' regularized methods often stifle the exploration needed to discover high-quality responses. In this work, we formali
By Hsiang Hsu, Eric Lei, Chun-Fu Chen