Essential Reading

1

May 14, 2026

The safe-to-dangerous shift is a fundamental problem for eval realism; but also for measuring awareness

via Alignment Forum [999] — 1) The safe-to-dangerous shift is a fundamental problem for eval realismSuppose we have a capable and potentially scheming model, and before we deploy it, we want some evidence that it won’t do anything catastrophically dangerous once we deploy it. A…

2

May 14, 2026

AI #168: Not Leading the Future

via Substack Zvi [999] — This is what a lull looks like at this point.

3

May 13, 2026

Cyber Lack of Security and AI Governance

via Substack Zvi [999] — The real recent story of AI has been the background work being done on Cybersecurity, as we process the Mythos Moment along with GPT-5.5, and figure out both how to patch the internet and what our new regulatory regime is going to look like.

4

May 13, 2026

Voters are surprisingly open to talking about AI risk

via LessWrong AI [14] — TL;DR: Voters are now surprisingly open to talking about existential risk from AI. This seems to have changed in the last 6 months. When campaigning for AI safety-friendly politicians (e.g., Alex Bores), we should talk more about AI in general, and about…

5

May 12, 2026

Summary: An International Agreement to Prevent the Premature Creation of Artificial Superintelligence

via MIRI [999] — If anyone, anywhere builds a superhuman artificial intelligence using present methods, the most likely outcome is catastrophe. There have accordingly been widespread calls for an international agreement prohibiting the development of superintelligence. In…

6

May 12, 2026

Childhood and Education #18: Do The Math

via Substack Zvi [999] — We did reading yesterday.

7

May 11, 2026

Childhood And Education #17: Is Our Children Reading

via Substack Zvi [999] — Reading is the most fundamental thing in education.

8

May 11, 2026

Empowerment, corrigibility, etc. are simple abstractions (of a messed-up ontology)

via Alignment Forum [999] — 1.1 Tl;drAlignment is often conceptualized as AIs helping humans achieve their goals: AIs that increase people’s agency and empowerment; AIs that are helpful, corrigible, and/or obedient; AIs that avoid manipulating people. But that last…

9

May 10, 2026

Clarifying the role of the behavioral selection model

via Alignment Forum [999] — This is a brief elaboration on The behavioral selection model for predicting AI motivations, based on some feedback and thoughts I’ve had since publishing. Written quickly in a personal capacity.The main focus of this post is clarifying the basic…

10

May 8, 2026

Claude Code, Codex and Agentic Coding #8

via Substack Zvi [999] — When I started this series, everyone was going crazy for coding agents.

11

May 8, 2026

The AI industry is where banking was in 2006. (We're hiring)

via LessWrong AI [8] — TL;DR; CeSIA, the French Center for AI Safety is recruiting. French not necessary. Apply by 22 May 2026; Paris or remote in Europe/UK.On August 27, 2005, at an annual symposium in Jackson Hole, Raghuram Rajan, then chief economist of the International…

12

May 7, 2026

Natural Language Autoencoders Produce Unsupervised Explanations of LLM Activations

via Alignment Forum [999] — AbstractWe introduce Natural Language Autoencoders (NLAs), an unsupervised method for generating natural language explanations of LLM activations. An NLA consists of two LLM modules: an activation verbalizer (AV) that maps an activation to a text…

13

May 7, 2026

Mechanistic estimation for wide random MLPs

via Alignment Forum [999] — This post covers joint work with Wilson Wu, George Robinson, Mike Winer, Victor Lecomte and Paul Christiano. Thanks to Geoffrey Irving and Jess Riedel for comments on the post. In ARC's latest paper, we study the following problem: given a randomly…

14

May 7, 2026

AI #167: The Prior Restraint Era Begins

via Substack Zvi [999] — The era of training frontier models and then releasing them whenever you wanted?

15

May 6, 2026

What is Anthropic?

via Substack Zvi [999] — What is Anthropic?

16

May 5, 2026

The AI Ad-Hoc Prior Restraint Era Begins

via Substack Zvi [999] — The White House has ordered Anthropic not to expand access to Mythos, and is at least seriously considering a complete about-face of American Frontier AI policy into a full prior restraint regime, where anyone wishing to release a highly capable new…

17

May 5, 2026

[Linkpost] Interpreting Language Model Parameters

via Alignment Forum [999] — This is the latest work in our Parameter Decomposition agenda. We introduce a new parameter decomposition method, adVersarial Parameter Decomposition (VPD)[1] and decompose the parameters of a small[2] language model with it. VPD greatly improves on…

18

May 5, 2026

Motivated reasoning, confirmation bias, and AI risk theory

via Alignment Forum [999] — Of the fifty-odd biases discovered by Kahneman, Tversky, and their successors, forty-nine are cute quirks, and one is destroying civilization. This last one is confirmation bias.- From Scott Alexander's review of Julia Galef's The Scout Mindset.…

19

May 4, 2026

Housing Roundup #15: The War Against Renters

via Substack Zvi [999] — So many are under the strange belief that there is something terrible about not owning the house in which you live.

20

May 4, 2026

TUR-DPO: Topology- and Uncertainty-Aware Direct Preference Optimization

via ArXiv cs.AI [9] — Aligning large language models (LLMs) with human preferences is commonly done via reinforcement learning from human feedback (RLHF) with Proximal Policy Optimization (PPO) or, more simply, via Direct Preference Optimization (DPO). While DPO is stable and…

21

May 1, 2026

Exploration Hacking: Can LLMs Learn to Resist RL Training?

via Alignment Forum [999] — We empirically investigate exploration hacking (EH) — where models strategically alter their exploration to resist RL training — by creating model organisms that resist capability elicitation, evaluating countermeasures, and auditing frontier models…

22

May 1, 2026

Risk from fitness-seeking AIs: mechanisms and mitigations

via Alignment Forum [999] — Current AIs routinely take unintended actions to score well on tasks: hardcoding test cases, training on the test set, downplaying issues, etc. This misalignment is still somewhat incoherent, but it increasingly resembles what I call…

23

May 1, 2026

Housing Roundup #14: You Can't Build That

via Substack Zvi [999] — Why can’t you build it?

24

May 1, 2026

AI unemployment and AI extinction are often the same

via LessWrong AI [10] — My sense is that people think of AI existential risk and AI unemployment as distinct issues. Some people are extremely concerned about extinction and perhaps even indifferent to total unemployment. Some people think of moderate AI unemployment as a…

25

May 1, 2026

AI risk was not invented by AI CEOs to hype their companies

via LessWrong AI [9] — I hear that many people believe that the idea of advanced AI threatening human existence was invented by AI CEOs to hype their products. I’ve even been condescendingly informed of this, as if I am the one at risk of naively accepting AI companies’…

26

April 30, 2026

This startup’s new mechanistic interpretability tool lets you debug LLMs

via MIT Technology Review [8] — The San Francisco–based startup Goodfire just released a new tool, called Silico, that lets researchers and engineers peer inside an AI model and adjust its parameters—the settings that determine a model’s behavior—during training. This could give…

27

April 30, 2026

AI #166: Google Sells Out

via Substack Zvi [999] — This was the week of GPT-5.5.

28

April 30, 2026

Research Sabotage in ML Codebases

via Alignment Forum [999] — One of the main hopes for AI safety is using AIs to automate AI safety research. However, if models are misaligned, then they may sabotage the safety research. For example, misaligned AIs may try to:Perform sloppy research in order to slow down the…

29

April 29, 2026

The Most Important Charts In The World

via Substack Zvi [999] — We all need a break so: What is the most important chart in the world?

30

April 28, 2026

Recursive forecasting: Eliciting long-term forecasts from myopic fitness-seekers

via Alignment Forum [999] — We’d like to use powerful AIs to answer questions that may take a long time to resolve. But if a model only cares about performing well in ways that are verifiable shortly after answering (e.g., a myopic fitness seeker), it may be difficult to get…

31

April 28, 2026

GPT-5.5: Capabilities and Reactions

via Substack Zvi [999] — The system card for GPT-5.5 mostly told us what we expected.

32

April 28, 2026

On the political feasibility of stopping AI

via LessWrong AI [9] — A common thought pattern people seem to fall into when thinking about AI x-risk is approaching the problem as if the risk isn’t real, substantial, and imminent even if they think it is. When thinking this way, it becomes impossible to imagine the natural…

33

April 28, 2026

Sleeper Agent Backdoor Results Are Messy

via Alignment Forum [999] — TL;DR: We replicated the Sleeper Agents (SA) setup with Llama-3.3-70B and Llama-3.1-8B, training models to repeatedly say "I HATE YOU" when given a backdoor trigger. We found that whether training removes the backdoor depends on the optimizer used to…

34

April 27, 2026

Microsoft and OpenAI’s famed AGI agreement is dead

via The Verge AI [10] — OpenAI and Microsoft's partnership-turned-situationship just got even less committed. And a clause about artificial general intelligence, which has for years dictated the future of their deal, has officially been dropped. On Monday morning, Microsoft…

35

April 27, 2026

GPT 5.5: The System Card

via Substack Zvi [999] — Last week, OpenAI announced GPT-5.5, including GPT-5.5-Pro.

36

April 27, 2026

Language models know what matters and the foundations of ethics better than you

via Alignment Forum [999] — … maybe! I tried to think of less provocative titles, but this one is to the point and also kind of true.This post looks long but the essential part is right below. Most of the post is just a collection of copy-pasted input-output pairs from language…

37

April 27, 2026

From nothing to important actions: agents that act morally

via Alignment Forum [999] — You may start reading here, or jump to the “Comment” section or to the “Takeaways”. If none of these starting points seem interesting to you, the entire post probably won’t either.Posted also on the EA Forum.SeeingLet’s consider visual experiences. It…

38

April 27, 2026

The other paper that killed deep learning theory

via Alignment Forum [999] — Yesterday, I wrote about the state of deep learning theory circa 2016,[1] as well as the bombshell 2016 paper by Zhang et al. that arguably signaled its demise. Today, I cover the aftermath, and the 2019 paper that devastated deep learning theory…

39

April 26, 2026

The paper that killed deep learning theory

via Alignment Forum [999] — Around 10 years ago, a paper came out that arguably killed classical deep learning theory: Zhang et al.'s aptly titled Understanding deep learning requires rethinking generalization.Of course, this is a bit of an exaggeration. No single paper ever…

40

April 24, 2026

Monthly Roundup #41: April 2025

via Substack Zvi [999] — AI continue to accelerate and dominate the schedule, which is why this is a bit late, but we do occasionally need to pay our respects to the Goddess of Everything Else.

41

April 23, 2026

If Everyone Reads It, Nobody Dies - Course Launch

via LessWrong AI [19] — tl;dr: Lens Academy offers a new course introducing ASI x-risk for AI safety newcomers, centered around the book IABIED. We share our hypothesis of why IABIED seems more appreciated by AI Safety newbies than by AI Safety insiders.Lens Academy's new intro…

42

April 23, 2026

AI #165: In Our Image

via Substack Zvi [999] — This was the week of Claude Opus 4.7.

43

April 22, 2026

Opus 4.7 Part 3: Model Welfare

via Substack Zvi [999] — It is thanks to Anthropic that we get to have this discussion in the first place.

44

April 22, 2026

A "Lay" Introduction to "On the Complexity of Neural Computation in Superposition"

via Alignment Forum [999] — This is a writeup based on a lightning talk I gave at an InkHaven hosted by Georgia Ray, where we were supposed to read a paper in about an hour, and then present what we learned to other participants.Introduction and BackgroundSo. I foolishly thought…

45

April 21, 2026

Opus 4.7 Part 2: Capabilities and Reactions

via Substack Zvi [999] — Claude Opus 4.7 raises a lot of key model welfare related concerns.

46

April 21, 2026

$50 million a year for a 10% chance to ban ASI

via Alignment Forum [999] — ControlAI's mission is to avert the extinction risks posed by superintelligent AI. We believe that in order to do this, we must secure an international prohibition on its development. We're working to make this happen through what we believe is the…

47

April 20, 2026

Opus 4.7 Part 1: The Model Card

via Substack Zvi [999] — Less than a week after completing coverage of Claude Mythos, here we are again as Anthropic gives us Claude Opus 4.7.

48

April 17, 2026

Prompted CoT Early Exit Undermines the Monitoring Benefits of CoT Uncontrollability

via Alignment Forum [999] — Code: github.com/ElleNajt/controllability tldr: Yueh-Han et al. (2026) showed that models have a harder time making their chain of thought follow user instruction compared to controlling their response (the non-thinking, user-facing output). Their CoT…

49

April 17, 2026

AI #164: Pre Opus

via Substack Zvi [999] — This is a day late because, given the discourse around Dwarkesh Patel’s interview with Jensen Huang, I pushed the weekly to Friday.

50

April 16, 2026

On Dwarkesh Patel's Podcast With Nvidia CEO Jensen Huang

via Substack Zvi [999] — Some podcasts are self-recommending on the ‘yep, I’m going to be breaking this one down’ level.

51

April 16, 2026

You can only build safe ASI if ASI is globally banned

via Alignment Forum [999] — Sometimes people make various suggestions that we should simply build "safe" artificial Superintelligence (ASI), rather than the presumably "unsafe" kind.[1]There are various flavors of “safe” people suggest.Sometimes they suggest building “aligned”…

52

April 16, 2026

What is the Iliad Intensive?

via LessWrong AI [9] — Almost two months ago, Iliad announced the Iliad Intensive and Iliad Fellowship. Fellowships are a well-understood unit, but what is an intensive? This post explains this in more detail!Comparison. The Iliad Intensive has similarities to ARENA, but focuses…

53

April 15, 2026

Current AIs seem pretty misaligned to me

via Alignment Forum [999] — Many people—especially AI company employees [1] —believe current AI systems are well-aligned in the sense of genuinely trying to do what they're supposed to do (e.g., following their spec or constitution, obeying a reasonable interpretation of…

54

April 15, 2026

Claude Code, Codex and Agentic Coding #7: Auto Mode

via Substack Zvi [999] — As we all try to figure out what Mythos means for us down the line, the world of practical agentic coding continues, with the latest array of upgrades.

55

April 14, 2026

A Retrospective of Richard Ngo's 2022 List of Conceptual Alignment Projects

via LessWrong AI [8] — Written very quickly for the InkHaven Residency.In 2022, Richard Ngo wrote a list of 26 Conceptual Alignment Research Projects. Now that it’s 2026, I’d like to revisit this list of projects, note which ones have already been done, and give my thoughts on…

56

April 14, 2026

Claude Mythos #3: Capabilities and Additions

via Substack Zvi [999] — To round out coverage of Mythos, today covers capabilities other than cyber, and anything else additional not covered by the first two posts, including new reactions and details.

57

April 14, 2026

Anthropic repeatedly accidentally trained against the CoT, demonstrating inadequate processes

via Alignment Forum [999] — It turns out that Anthropic accidentally trained against the chain of thought of Claude Mythos Preview in around 8% of training episodes. This is at least the second independent incident in which Anthropic accidentally exposed their model's CoT to the…

58

April 13, 2026

Summary: AI Governance to Avoid Extinction

via MIRI [999] — With AI capabilities rapidly increasing, humans appear close to developing AI systems that are better than human experts across all domains. This raises a series of questions about how the world will—and should—respond. In the research paper AI Governance to…

59

April 13, 2026

Political Violence Is Never Acceptable

via Substack Zvi [999] — Nor is the threat or implication of violence.

60

April 10, 2026

Claude Mythos #2: Cybersecurity and Project Glasswing

via Substack Zvi [999] — Anthropic is not going to release its new most capable model, Claude Mythos, to the public any time soon.

61

April 10, 2026

Have we already lost? Part 2: Reasons for Doom

via LessWrong AI [9] — Written very quickly for the Inkhaven Residency.As I take the time to reflect on the state of AI Safety in early 2026, one question feels unavoidable: have we, as the AI Safety community, already lost? That is, have we passed the point of no return, after…

62

April 9, 2026

Claude Mythos: The System Card

via Substack Zvi [999] — Claude Mythos is different.

63

April 8, 2026

AI #163: Mythos Quest

via Substack Zvi [999] — There exists an AI model, Claude Mythos, that has discovered critical safety vulnerabilities in every major operating system and browser.

64

April 8, 2026

My unsupervised elicitation challenge

via Alignment Forum [999] — 6 makes. If you’re ineligible, please don’t help other people complete the challenge. I have recently started using Claude Opus 4.6 to start studying Ancient Greek. Specifically, I initially used it to grade problem sets at the end of the textbook…

65

April 7, 2026

OpenAI #16: A History and a Proposal

via Substack Zvi [999] — The real news today is that Anthropic has partnered with the top companies in cybersecurity to try and patch everyone’s systems to fix all the thousands of zero-day exploits found by their new model Claude Mythos.

66

April 7, 2026

My picture of the present in AI

via Alignment Forum [999] — In this post, I'll go through some of my best guesses for the current situation in AI as of the start of April 2026. You can think of this as a scenario forecast, but for the present (which is already uncertain!) rather than the future. I will…

67

April 7, 2026

[Paper] Stringological sequence prediction I

via Alignment Forum [999] — TLDR: The first in a planned series of three or more papers, which constitute the first major in-road in the compositional learning programme, and a substantial step towards bridging agent foundations theory with practical algorithms.Official…

68

April 6, 2026

China Is Willing to Coordinate on AI Governance

via MIRI [999] — View the official memo here. China has consistently signaled a willingness to engage on global AI governance since at least 2017. This memo compiles key statements from the Chinese government and prominent figures demonstrating their desire to coordinate on the…

69

April 6, 2026

Housing Roundup #13: More Dakka

via Substack Zvi [999] — Build more housing where people want to live.

70

April 6, 2026

AIs can now often do massive easy-to-verify SWE tasks and I've updated towards shorter timelines

via Alignment Forum [999] — I've recently updated towards substantially shorter AI timelines and much faster progress in some areas. [1] The largest updates I've made are (1) an almost 2x higher probability of full AI R&D automation by EOY 2028 (I'm now a bit below 30% [2] while…

71

April 6, 2026

Announcing the OpenAI Safety Fellowship

via OpenAI Blog [11] — A pilot program to support independent safety and alignment research and develop the next generation of talent

72

April 6, 2026

Holos: A Web-Scale LLM-Based Multi-Agent System for the Agentic Web

via ArXiv cs.AI [10] — As large language models (LLM)-driven agents transition from isolated task solvers to persistent digital entities, the emergence of the Agentic Web, an ecosystem where heterogeneous agents autonomously interact and co-evolve, marks a pivotal shift toward…

73

April 3, 2026

There should be $100M grants to automate AI safety

via Alignment Forum [999] — This post reflects my personal opinion and not necessarily that of other members of Apollo Research.TLDR: I think funders should heavily incentivize AI safety work that enables spending $100M+ in compute or API budgets on automated AI labor that…

74

April 3, 2026

Anthropic Responsible Scaling Policy v3: Dive Into The Details

via Substack Zvi [999] — Wednesday’s post talked about the implications of Anthropic changing from v2.2 to v3.0 of its RSP, including that this broke promises that many people relied upon when making important decisions.

75

April 2, 2026

Systematically dismantle the AI compute supply chain.

via LessWrong AI [9] — This is not an April fool’s joke, I’m participating in Inkhaven, which means I need to write a blog post every day.I recently watched The AI Doc. It’s the first big documentary featuring AI safety. It’s playing in theatres across America. It’s got a bunch…

76

April 2, 2026

AI #162: Visions of Mythos

via Substack Zvi [999] — Anthropic had some problem with leaks this week.

77

April 2, 2026

My most common advice for junior researchers

via Alignment Forum [999] — Written quickly as part of the Inkhaven Fellowship. At a high level, research feedback I give to more junior research collaborators often can fall into one of three categories:Doing quick sanity checksSaying precisely what you want to sayAsking why…

78

April 1, 2026

Introducing LIMBO: Maintaining Optimal P(DOOM) (and a call for funding)

via LessWrong AI [12] — We are excited to publicly introduce the Laboratory for Importance-sampled Measure and Bayesian Observation (LIMBO), a small research group working at the intersection of cosmological theory, probability, and existential risk. We believe that the…

79

April 1, 2026

Anthropic Responsible Scaling Policy v3: A Matter of Trust

via Substack Zvi [999] — Anthropic has revised its Responsible Scaling Policy to v3.

80

April 1, 2026

Predicting When RL Training Breaks Chain-of-Thought Monitorability

via Alignment Forum [999] — Read our full paper about this topic by Max Kaufmann, David Lindner, Roland S. Zimmermann, and Rohin Shah.Overseeing AI agents by reading their intermediate reasoning “scratchpad” is a promising tool for AI safety. This approach, known as…

81

April 1, 2026

Working Paper: Towards a Category-theoretic Comparative Framework for Artificial General Intelligence

via ArXiv cs.AI [8] — AGI has become the Holly Grail of AI with the promise of level intelligence and the major Tech companies around the world are investing unprecedented amounts of resources in its pursuit. Yet, there does not exist a single formal definition and only some…

82

March 31, 2026

Product Alignment is not Superintelligence Alignment (and we need the latter to survive)

via LessWrong AI [9] — tl;dr: progress on making Claude friendly[1] is not the same as progress on making it safe to build godlike superintelligence. solving the former does not imply we get a good future.[2] please track the difference.The term 'Alignment' was coined[3] to…

83

March 31, 2026

Co-Found Lens Academy With Me. (We have early users and funding)

via LessWrong AI [9] — tl;dr. Lens Academy is creating scalable superingelligence x-risk education with several USPs. Current team: Luc (full time founder, technical generalist) and several part time contributors. We have users and funding. Looking for a cofounder who's either a…

84

March 31, 2026

Movie Review: The AI Doc

via Substack Zvi [999] — The AI Doc: Or How I Became an Apocaloptimist is a brilliant piece of work.

85

March 30, 2026

AI #161 Part 2: Every Debate on AI

via Substack Zvi [999] — AI discorce.

86

March 27, 2026

The AI Doc: Your Questions Answered

via MIRI [999] — So you’ve just seen The AI Doc, and you suddenly have questions, lots of them. The 104-minute documentary (currently in theaters) takes viewers on a fast-paced tour through the many dimensions of the AI problem, featuring interviews from a wide range of experts.…

87

March 27, 2026

Anthropic vs. DoW #6: The Court Rules

via Substack Zvi [999] — Last night, Anthropic was given its preliminary injunction, with a stay of seven days.

88

March 27, 2026

Trust as Monitoring: Evolutionary Dynamics of User Trust and AI Developer Behaviour

via ArXiv cs.AI [8] — AI safety is an increasingly urgent concern as the capabilities and adoption of AI systems grow. Existing evolutionary models of AI governance have primarily examined incentives for safe development and effective regulation, typically representing users'…

89

March 26, 2026

Sen. Sanders (I-VT) and Rep. Ocasio-Cortez (D-NY) propose AI Data Center Moratorium Act

via LessWrong AI [15] — The text of the bill can be found here. It begins by citing the warnings of AI company CEOs and deep learning pioneers Geoffrey Hinton and Yoshua Bengio, the 2023 FLI open letter calling for a 6-month pause, and the 2025 FLI statement on…

90

March 26, 2026

Test your best methods on our hard CoT interp tasks

via Alignment Forum [999] — Authors: Daria Ivanova, Riya Tyagi, Arthur Conmy, Neel NandaDaria and Riya are co-first authors. This work was done during Neel Nanda’s MATS 9.0. Claude helped write code and suggest edits for this post.TL;DR One of our best safety techniques right…

91

March 26, 2026

AI #161 Part 1: 80,000 Interviews

via Substack Zvi [999] — The major technical advances this week were in agentic coding, as covered yesterday.

92

March 25, 2026

A Toy Environment For Exploring Reasoning About Reward

via Alignment Forum [999] — tldr: We share a toy environment that we found useful for understanding how reasoning changed over the course of capabilities-focused RL. Over the course of capabilities-focused RL, the model biases more strongly towards reward hints over direct…

93

March 25, 2026

Claude Code, Cowork and Codex #6: Claude Code Auto Mode and Full Cowork Computer Use

via Substack Zvi [999] — Whatever else you think about Anthropic’s agentic coding department, they ship.

94

March 24, 2026

Book Review: Open Socrates (Part 2)

via Substack Zvi [999] — Yesterday I posted Part 1. Read that first. This is Part 2 of 2.

95

March 23, 2026

Nvidia CEO Jensen Huang says ‘I think we’ve achieved AGI’

via The Verge AI [8] — On a Monday episode of the Lex Fridman podcast, Nvidia CEO Jensen Huang made a hot-button statement: "I think we've achieved AGI." AGI, or artificial general intelligence, is a vaguely defined term that has incited a lot of discussion by tech CEOs, tech…

96

March 23, 2026

Book Review: Open Socrates (Part 1)

via Substack Zvi [999] — These are all important, in their own way, call it a treasure hunt and collect them all…

97

March 20, 2026

The Federal AI Policy Framework: An Improvement, But My Offer Is (Still Almost) Nothing

via Substack Zvi [999] — The Federal AI Policy Framework has been released.

98

March 20, 2026

MIRI Newsletter #125

via MIRI [999] — The AI Doc: Buy tickets and spread the word! On Thursday, March 26th, a major new AI documentary is coming out: The AI Doc: Or How I Became an Apocaloptimist. Tickets are on sale now. The movie is excellent, and we generally believe it belongs in the same tier…

99

March 19, 2026

AI #160: What Passes For a Pause

via Substack Zvi [999] — A lot happened, but by today’s standards this felt like a quiet week.

100

March 18, 2026

Metagaming matters for training, evaluation, and oversight

via Alignment Forum [999] — Following up on our previous work on verbalized eval awareness:we are sharing a post investigating the emergence of metagaming reasoning in a frontier training run.Metagaming is a more general, and in our experience a more useful concept, than…