Analysis

Is ProgramBench Impossible?

Zac Boring May 8, 2026 1 min read

ProgramBench is a new coding benchmark that all frontier models spectacularly fail. We’ve been on a quest for “hard benchmarks” for a while so it’s refreshing to see a benchmark where top models do badly. Unfortunately, ProgramBench has one big problem: it’s impossible!What is ProgramBench?ProgramBench tests if a model can recreate a program from a “clean room” environment. The model is given only a bit of documentation and black-box access to the program (all the programs are CLIs), then tasked

By frmsaul

Read the full article at LessWrong AI →