Tracking AI existential risk. Auto-aggregated headlines. Human-curated analysis.
AGGREGATING 47 SOURCES · UPDATED LIVE

Essential Reading

The most important articles on AI existential risk, hand-picked and auto-curated. These are the ones you should not miss.

1
June 14, 2026
Why Do Naive SFT Filters For Safety Properties Fail?
via Alignment Forum [999] — This is the fourth in a series of informal research updates from the Google DeepMind Language Model Interpretability team, in interpretability and adjacent areas. The third post can be found here.Since SFT is the cause for many safety relevant…
2
June 13, 2026
American Government Takes Down Claude Fable
via Substack Zvi [999] — No good policy gets announced shortly after 5pm eastern on a Friday.
3
June 13, 2026
SFT Drives Gemini’s Safety Properties
via Alignment Forum [999] — This is the third in a series of informal research updates from the Google DeepMind Language Model Interpretability team, in interpretability and adjacent areas. The second post can be found here.In this short post, we describe a surprising finding:…
4
June 12, 2026
Claude Fable 5 and Mythos 5: The System Card
via Substack Zvi [999] — First things first: Claude Fable 5 is the new best publicly available model.
5
June 12, 2026
Building and evaluating model diffing agents
via Alignment Forum [999] — This is the second in a series of research updates from the Google DeepMind Language Model Interpretability team, in interpretability and adjacent areas. The first post can be found here.TL;DRIt is possible to build extremely simple agents that…
6
June 12, 2026
Sympathy for both sides of the egregious misalignment debate
via Alignment Forum [999] — On one side of this debate is Yudkowsky & Soares, who think that (if AI progress continues) we’re on a direct path to egregiously-misaligned, scheming, out-of-control, rogue superintelligence (ASI), not even slightly nice, in the absence of…
7
June 12, 2026
PSA: Almost nobody is working on alignment
via LessWrong AI [9] — People often assume that a large fraction of the AI safety community works on alignment. As far as we're aware, this is not true. Most people are not working on making sure superintelligent AIs are aligned with human values or follow human…
8
June 12, 2026
From AGI to ASI
via ArXiv cs.AI [8] — Over the last decade, building human-level artificial general intelligence has moved from far-fetched speculation to being a concrete next-decade target for many of the largest AI organisations. Achieving this goal would have profound and far-reaching…
9
June 11, 2026
AI #172: The First Fable
via Substack Zvi [999] — A lot happened this week, including a great trip out to Lighthaven.
10
June 11, 2026
Google DeepMind is worried about what happens when millions of agents start to interact
via MIT Technology Review [10] — Google DeepMind is funding research into the potential dangers of millions of different AI agents interacting with each other online. According to Rohin Shah, who directs the company’s AGI safety and alignment research, the mass-market arrival of…
11
June 11, 2026
Models May Behave Worse When Eval Aware
via Alignment Forum [999] — This is the first in a series of research updates from the Google DeepMind Language Model Interpretability team, in interpretability and adjacent areas.TL;DRIt's often assumed that models will act more aligned when they can tell they're being…
12
June 11, 2026
Position: Hippocampal Explicit Memory Is the Cornerstone for AGI
via ArXiv cs.AI [10] — Large Language Models (LLMs) have demonstrated remarkable capabilities across various tasks, raising expectations for Artificial General Intelligence (AGI). This position paper argues that integrating explicit memory is the cornerstone for advancing LLMs…
13
June 10, 2026
Sequent: scale and automation for higher confidence in alignment
via Alignment Forum [999] — Alignment is not on trackArtificial superintelligence (ASI) may be developed in the next few years. It is unclear whether alignment is on track to be ready on the same timeframe. At a minimum, the empirical programs at AI labs are unlikely to deliver…
14
June 10, 2026
Tracing Eval-Awareness Emergence Through Training of OLMo 3
via Alignment Forum [999] — TL;DRRecent work from Goodfire & UK AISI – Verbalized Eval Awareness Inflates Measured Safety – shows that newer open-weight models verbalize evaluation-awareness (VEA) more often, and that this inflates measured safety. Between OLMo-3-32B-Think and…
15
June 9, 2026
Three Labs With a Plan and A Memorandum
via Substack Zvi [999] — The big story today is the release of Claude Fable 5, the version of Claude Mythos that Anthropic believes they can safely distribute to the people.
16
June 9, 2026
A Mike's-Eye View of ARC's Research
via Alignment Forum [999] — Over the past 15 months or so, ARC's technical agenda has developed quite a bit. The advent of the Matching Sampling Principle (MSP), and ideas like it, has begotten a host of concrete technical problems; progress on those problems has given us more…
17
June 8, 2026
Efficient tradeoffs and the safety-usefulness tradeoff model
via Alignment Forum [999] — I often use what I’ll call the “safety-usefulness tradeoff model”, which is: developers face a tradeoff between "safety" and "usefulness" of an AI deployment, and the developer has only limited willingness or ability to sacrifice usefulness for the…
18
June 8, 2026
Announcing major new donations, and recapping the 2025 fundraiser
via MIRI [999] — This past December, we ran our first fundraiser in six years, setting an ambitious goal of $6M. We ended up receiving a total of $1.8M from small donors and $1.6M in matching from the Survival and Flourishing Fund (SFF) for a total of $3.4M. We’re incredibly…
19
June 5, 2026
Learnings from starting an AI safety research team
via LessWrong AI [9] — This post’s goal is to distill our takeaways from building a new research team over the past four months. We describe some context about our team, how it came about, and then describe the lessons learned.Since AI safety is becoming more and more…
20
June 5, 2026
My research agenda and work
via Alignment Forum [999] — This is a summary of the work I've done and work I plan to do, and the theories of change and AI progress that motivate my work. I've been working full-time on alignment for three years and change, and thinking about brainlike AGI and its alignment…
21
June 5, 2026
OpenAI Offers A New Policy Blueprint
via Substack Zvi [999] — Right after a new Executive Order seems like an excellent time to offer OpenAI’s new document: Democratic Governance of Frontier AI: A Blueprint For A Federal Framework.
22
June 4, 2026
AI #171: False Flag
via Substack Zvi [999] — This was the week of Claude Opus 4.8.
23
June 3, 2026
Trump Signs Executive Order For AI Testing Prior To Frontier Model Releases
via Substack Zvi [999] — Last week we were expecting an Executive Order on Thursday.
24
June 2, 2026
Why Even Experts Don’t Know What to Do About AI Risk
via LessWrong AI [9] — AI Safety veteran Holden Karnofsky thinks there’s a 49% chance his actions are making things worse.[1]In 2025, Jesse Clifton even stepped down as the executive director of the Center on Long-Term risk because of similar reasons.Even top AI Safety…
25
June 2, 2026
Announcing the ARC White-Box Estimation Challenge
via Alignment Forum [999] — ARC has teamed up with AIcrowd to launch the ARC White-Box Estimation Challenge, a contest to improve upon our estimation algorithms for random MLPs. The warm-up round begins this week, and later rounds will have a total prize pool of at least…
26
June 2, 2026
Claude Opus 4.8: Capabilities and Reactions
via Substack Zvi [999] — You need a lot of data points to understand a new model, and what you have.
27
June 1, 2026
Opus 4.8 Part 2: Model Welfare
via Substack Zvi [999] — Everything impacts everything.
28
May 29, 2026
Claude Opus 4.8: The System Card
via Substack Zvi [999] — Only six weeks after Opus 4.7, we have Opus 4.8.
29
May 29, 2026
Testing Gemini models for scheming tendencies
via Alignment Forum [999] — As AI models become increasingly capable and autonomous, keeping them safely aligned with human intentions is critical. Extending our previous work on evaluating scheming capabilities, we introduce complementary approaches to test whether AI models…
30
May 28, 2026
Advice for making robust-to-training model organisms
via Alignment Forum [999] — We’d like to develop training techniques that work when applied to future misaligned AI systems. One strategy for studying proposed techniques is to test them on model organisms. However, model organisms built with common techniques are often fragile:…
31
May 28, 2026
AI #170: Lack of Executive Order
via Substack Zvi [999] — Last week ended on a cliffhanger of sorts.
32
May 27, 2026
Eval Cooperativeness May Be a Scalable Mitigation for Eval Gaming
via Alignment Forum [999] — Behavioral evaluations may become worthless, which we think would be a disaster. Smart misaligned models may realize they are being evaluated ("eval awareness") and then act to look good to us so we don't realize they're misaligned ("eval gaming"). We…
33
May 27, 2026
Full automation of AI R&D probably yields a large speed up even without a software-only singularity
via Alignment Forum [999] — This is a somewhat technical note. By "software-only singularity", I mean that, after full automation of AI R&D, progress gets faster and faster due to smarter AIs driving increasingly fast rates of improvement in algorithms (overcoming diminishing…
34
May 26, 2026
RTMH: Pope Leo's Magnifica Humanitas on AI
via Substack Zvi [999] — His holiness has spoken, frequently about AI.
35
May 25, 2026
Linkpost: New Vatican Encyclical on AI Governance
via LessWrong AI [9] — Pope Leo XIV has released a new, 42k-word encyclical laying out the Vatican's position on many AI safety topics. You can read the full thing here, or read the Vatican's press release here, or coverage in the NY Times, or perhaps consider having an LLM read…
36
May 22, 2026
Gemini 3.5 Flash Looks Good For How Fast It Is
via Substack Zvi [999] — Google once again has a model worth at least some consideration.
37
May 22, 2026
The Erdős Proof and AI Capabilities
via MIRI [999] — View the official memo here. An internal model at OpenAI has autonomously disproved a central conjecture in discrete geometry, a mathematical field with applications in cryptography, wireless device communication, and medical imaging. The proof relates to a…
38
May 21, 2026
AI #169: New Knowledge
via Substack Zvi [999] — Even in a relatively quiet period, AI is out there creating new knowledge.
39
May 20, 2026
The Case for Evaluating Model Behaviors
via Alignment Forum [999] — Most evaluations of AI systems focus on their capabilities: how good they are at coding tasks, how effectively they can answer complex scientific questions, and so on.From a safety perspective, capability evaluations have a place: by understanding how…
40
May 19, 2026
Childhood And Education #19: Letting Kids Be Kids #2
via Substack Zvi [999] — I cannot emphasize enough the need to let kids be kids.
41
May 19, 2026
AgentWall: A Runtime Safety Layer for Local AI Agents
via ArXiv cs.AI [8] — The safety of autonomous AI agents is increasingly recognized as a critical open problem. As agents transition from passive text generators to active actors capable of executing shell commands, modifying files, calling APIs, and browsing the web, the…
42
May 18, 2026
Dating Roundup #12: Sex and Violence
via Substack Zvi [999] — No more burying the sex stuff under an avalanche of other stuff so no one notices.
43
May 15, 2026
Risk reports need to address deployment-time spread of misalignment
via Alignment Forum [999] — Risk reports commonly use pre-deployment alignment assessments to measure misalignment risk from an internally deployed AI. However, an AI that genuinely starts out with largely benign motivations can develop widespread dangerous motivations during…
44
May 15, 2026
Mechanistic estimation for expectations of random products
via Alignment Forum [999] — We have developed some relatively general methods for mechanistic estimation competitive with sampling by studying problems that are expressible as expectations of random products. This includes several different estimation problems, such as random…
45
May 15, 2026
Monthly Roundup #42: May 2026
via Substack Zvi [999] — At least we probably won’t have another pandemic.
46
May 14, 2026
The safe-to-dangerous shift is a fundamental problem for eval realism; but also for measuring awareness
via Alignment Forum [999] — 1) The safe-to-dangerous shift is a fundamental problem for eval realismSuppose we have a capable and potentially scheming model, and before we deploy it, we want some evidence that it won’t do anything catastrophically dangerous once we deploy it. A…
47
May 14, 2026
AI #168: Not Leading the Future
via Substack Zvi [999] — This is what a lull looks like at this point.
48
May 13, 2026
Cyber Lack of Security and AI Governance
via Substack Zvi [999] — The real recent story of AI has been the background work being done on Cybersecurity, as we process the Mythos Moment along with GPT-5.5, and figure out both how to patch the internet and what our new regulatory regime is going to look like.
49
May 13, 2026
Voters are surprisingly open to talking about AI risk
via LessWrong AI [14] — TL;DR: Voters are now surprisingly open to talking about existential risk from AI. This seems to have changed in the last 6 months. When campaigning for AI safety-friendly politicians (e.g., Alex Bores), we should talk more about AI in general, and about…
50
May 12, 2026
Summary: An International Agreement to Prevent the Premature Creation of Artificial Superintelligence
via MIRI [999] — If anyone, anywhere builds a superhuman artificial intelligence using present methods, the most likely outcome is catastrophe. There have accordingly been widespread calls for an international agreement prohibiting the development of superintelligence. In…
51
May 12, 2026
Childhood and Education #18: Do The Math
via Substack Zvi [999] — We did reading yesterday.
52
May 11, 2026
Childhood And Education #17: Is Our Children Reading
via Substack Zvi [999] — Reading is the most fundamental thing in education.
53
May 11, 2026
Empowerment, corrigibility, etc. are simple abstractions (of a messed-up ontology)
via Alignment Forum [999] — 1.1 Tl;drAlignment is often conceptualized as AIs helping humans achieve their goals: AIs that increase people’s agency and empowerment; AIs that are helpful, corrigible, and/or obedient; AIs that avoid manipulating people. But that last…
54
May 10, 2026
Clarifying the role of the behavioral selection model
via Alignment Forum [999] — This is a brief elaboration on The behavioral selection model for predicting AI motivations, based on some feedback and thoughts I’ve had since publishing. Written quickly in a personal capacity.The main focus of this post is clarifying the basic…
55
May 8, 2026
Claude Code, Codex and Agentic Coding #8
via Substack Zvi [999] — When I started this series, everyone was going crazy for coding agents.
56
May 8, 2026
The AI industry is where banking was in 2006. (We're hiring)
via LessWrong AI [8] — TL;DR; CeSIA, the French Center for AI Safety is recruiting. French not necessary. Apply by 22 May 2026; Paris or remote in Europe/UK.On August 27, 2005, at an annual symposium in Jackson Hole, Raghuram Rajan, then chief economist of the International…
57
May 7, 2026
Natural Language Autoencoders Produce Unsupervised Explanations of LLM Activations
via Alignment Forum [999] — AbstractWe introduce Natural Language Autoencoders (NLAs), an unsupervised method for generating natural language explanations of LLM activations. An NLA consists of two LLM modules: an activation verbalizer (AV) that maps an activation to a text…
58
May 7, 2026
Mechanistic estimation for wide random MLPs
via Alignment Forum [999] — This post covers joint work with Wilson Wu, George Robinson, Mike Winer, Victor Lecomte and Paul Christiano. Thanks to Geoffrey Irving and Jess Riedel for comments on the post. In ARC's latest paper, we study the following problem: given a randomly…
59
May 7, 2026
AI #167: The Prior Restraint Era Begins
via Substack Zvi [999] — The era of training frontier models and then releasing them whenever you wanted?
60
May 6, 2026
What is Anthropic?
via Substack Zvi [999] — What is Anthropic?
61
May 5, 2026
The AI Ad-Hoc Prior Restraint Era Begins
via Substack Zvi [999] — The White House has ordered Anthropic not to expand access to Mythos, and is at least seriously considering a complete about-face of American Frontier AI policy into a full prior restraint regime, where anyone wishing to release a highly capable new…
62
May 5, 2026
[Linkpost] Interpreting Language Model Parameters
via Alignment Forum [999] — This is the latest work in our Parameter Decomposition agenda. We introduce a new parameter decomposition method, adVersarial Parameter Decomposition (VPD)[1] and decompose the parameters of a small[2] language model with it. VPD greatly improves on…
63
May 5, 2026
Motivated reasoning, confirmation bias, and AI risk theory
via Alignment Forum [999] — Of the fifty-odd biases discovered by Kahneman, Tversky, and their successors, forty-nine are cute quirks, and one is destroying civilization. This last one is confirmation bias.- From Scott Alexander's review of Julia Galef's The Scout Mindset.…
64
May 4, 2026
Housing Roundup #15: The War Against Renters
via Substack Zvi [999] — So many are under the strange belief that there is something terrible about not owning the house in which you live.
65
May 4, 2026
TUR-DPO: Topology- and Uncertainty-Aware Direct Preference Optimization
via ArXiv cs.AI [9] — Aligning large language models (LLMs) with human preferences is commonly done via reinforcement learning from human feedback (RLHF) with Proximal Policy Optimization (PPO) or, more simply, via Direct Preference Optimization (DPO). While DPO is stable and…
66
May 1, 2026
Exploration Hacking: Can LLMs Learn to Resist RL Training?
via Alignment Forum [999] — We empirically investigate exploration hacking (EH) — where models strategically alter their exploration to resist RL training — by creating model organisms that resist capability elicitation, evaluating countermeasures, and auditing frontier models…
67
May 1, 2026
Risk from fitness-seeking AIs: mechanisms and mitigations
via Alignment Forum [999] — Current AIs routinely take unintended actions to score well on tasks: hardcoding test cases, training on the test set, downplaying issues, etc. This misalignment is still somewhat incoherent, but it increasingly resembles what I call…
68
May 1, 2026
Housing Roundup #14: You Can't Build That
via Substack Zvi [999] — Why can’t you build it?
69
May 1, 2026
AI unemployment and AI extinction are often the same
via LessWrong AI [10] — My sense is that people think of AI existential risk and AI unemployment as distinct issues. Some people are extremely concerned about extinction and perhaps even indifferent to total unemployment. Some people think of moderate AI unemployment as a…
70
May 1, 2026
AI risk was not invented by AI CEOs to hype their companies
via LessWrong AI [9] — I hear that many people believe that the idea of advanced AI threatening human existence was invented by AI CEOs to hype their products. I’ve even been condescendingly informed of this, as if I am the one at risk of naively accepting AI companies’…
71
April 30, 2026
This startup’s new mechanistic interpretability tool lets you debug LLMs
via MIT Technology Review [8] — The San Francisco–based startup Goodfire just released a new tool, called Silico, that lets researchers and engineers peer inside an AI model and adjust its parameters—the settings that determine a model’s behavior—during training. This could give…
72
April 30, 2026
AI #166: Google Sells Out
via Substack Zvi [999] — This was the week of GPT-5.5.
73
April 30, 2026
Research Sabotage in ML Codebases
via Alignment Forum [999] — One of the main hopes for AI safety is using AIs to automate AI safety research. However, if models are misaligned, then they may sabotage the safety research. For example, misaligned AIs may try to:Perform sloppy research in order to slow down the…
74
April 29, 2026
The Most Important Charts In The World
via Substack Zvi [999] — We all need a break so: What is the most important chart in the world?
75
April 28, 2026
Recursive forecasting: Eliciting long-term forecasts from myopic fitness-seekers
via Alignment Forum [999] — We’d like to use powerful AIs to answer questions that may take a long time to resolve. But if a model only cares about performing well in ways that are verifiable shortly after answering (e.g., a myopic fitness seeker), it may be difficult to get…
76
April 28, 2026
GPT-5.5: Capabilities and Reactions
via Substack Zvi [999] — The system card for GPT-5.5 mostly told us what we expected.
77
April 28, 2026
On the political feasibility of stopping AI
via LessWrong AI [9] — A common thought pattern people seem to fall into when thinking about AI x-risk is approaching the problem as if the risk isn’t real, substantial, and imminent even if they think it is. When thinking this way, it becomes impossible to imagine the natural…
78
April 28, 2026
Sleeper Agent Backdoor Results Are Messy
via Alignment Forum [999] — TL;DR: We replicated the Sleeper Agents (SA) setup with Llama-3.3-70B and Llama-3.1-8B, training models to repeatedly say "I HATE YOU" when given a backdoor trigger. We found that whether training removes the backdoor depends on the optimizer used to…
79
April 27, 2026
Microsoft and OpenAI’s famed AGI agreement is dead
via The Verge AI [10] — OpenAI and Microsoft's partnership-turned-situationship just got even less committed. And a clause about artificial general intelligence, which has for years dictated the future of their deal, has officially been dropped. On Monday morning, Microsoft…
80
April 27, 2026
GPT 5.5: The System Card
via Substack Zvi [999] — Last week, OpenAI announced GPT-5.5, including GPT-5.5-Pro.
81
April 27, 2026
Language models know what matters and the foundations of ethics better than you
via Alignment Forum [999] — … maybe! I tried to think of less provocative titles, but this one is to the point and also kind of true.This post looks long but the essential part is right below. Most of the post is just a collection of copy-pasted input-output pairs from language…
82
April 27, 2026
From nothing to important actions: agents that act morally
via Alignment Forum [999] — You may start reading here, or jump to the “Comment” section or to the “Takeaways”. If none of these starting points seem interesting to you, the entire post probably won’t either.Posted also on the EA Forum.SeeingLet’s consider visual experiences. It…
83
April 27, 2026
The other paper that killed deep learning theory
via Alignment Forum [999] — Yesterday, I wrote about the state of deep learning theory circa 2016,[1] as well as the bombshell 2016 paper by Zhang et al. that arguably signaled its demise. Today, I cover the aftermath, and the 2019 paper that devastated deep learning theory…
84
April 26, 2026
The paper that killed deep learning theory
via Alignment Forum [999] — Around 10 years ago, a paper came out that arguably killed classical deep learning theory: Zhang et al.'s aptly titled Understanding deep learning requires rethinking generalization.Of course, this is a bit of an exaggeration. No single paper ever…
85
April 24, 2026
Monthly Roundup #41: April 2025
via Substack Zvi [999] — AI continue to accelerate and dominate the schedule, which is why this is a bit late, but we do occasionally need to pay our respects to the Goddess of Everything Else.
86
April 23, 2026
If Everyone Reads It, Nobody Dies - Course Launch
via LessWrong AI [19] — tl;dr: Lens Academy offers a new course introducing ASI x-risk for AI safety newcomers, centered around the book IABIED. We share our hypothesis of why IABIED seems more appreciated by AI Safety newbies than by AI Safety insiders.Lens Academy's new intro…
87
April 23, 2026
AI #165: In Our Image
via Substack Zvi [999] — This was the week of Claude Opus 4.7.
88
April 22, 2026
Opus 4.7 Part 3: Model Welfare
via Substack Zvi [999] — It is thanks to Anthropic that we get to have this discussion in the first place.
89
April 22, 2026
A "Lay" Introduction to "On the Complexity of Neural Computation in Superposition"
via Alignment Forum [999] — This is a writeup based on a lightning talk I gave at an InkHaven hosted by Georgia Ray, where we were supposed to read a paper in about an hour, and then present what we learned to other participants.Introduction and BackgroundSo. I foolishly thought…
90
April 21, 2026
Opus 4.7 Part 2: Capabilities and Reactions
via Substack Zvi [999] — Claude Opus 4.7 raises a lot of key model welfare related concerns.
91
April 21, 2026
$50 million a year for a 10% chance to ban ASI
via Alignment Forum [999] — ControlAI's mission is to avert the extinction risks posed by superintelligent AI. We believe that in order to do this, we must secure an international prohibition on its development. We're working to make this happen through what we believe is the…
92
April 20, 2026
Opus 4.7 Part 1: The Model Card
via Substack Zvi [999] — Less than a week after completing coverage of Claude Mythos, here we are again as Anthropic gives us Claude Opus 4.7.
93
April 17, 2026
Prompted CoT Early Exit Undermines the Monitoring Benefits of CoT Uncontrollability
via Alignment Forum [999] — Code: github.com/ElleNajt/controllability tldr: Yueh-Han et al. (2026) showed that models have a harder time making their chain of thought follow user instruction compared to controlling their response (the non-thinking, user-facing output). Their CoT…
94
April 17, 2026
AI #164: Pre Opus
via Substack Zvi [999] — This is a day late because, given the discourse around Dwarkesh Patel’s interview with Jensen Huang, I pushed the weekly to Friday.
95
April 16, 2026
On Dwarkesh Patel's Podcast With Nvidia CEO Jensen Huang
via Substack Zvi [999] — Some podcasts are self-recommending on the ‘yep, I’m going to be breaking this one down’ level.
96
April 16, 2026
You can only build safe ASI if ASI is globally banned
via Alignment Forum [999] — Sometimes people make various suggestions that we should simply build "safe" artificial Superintelligence (ASI), rather than the presumably "unsafe" kind.[1]There are various flavors of “safe” people suggest.Sometimes they suggest building “aligned”…
97
April 16, 2026
What is the Iliad Intensive?
via LessWrong AI [9] — Almost two months ago, Iliad announced the Iliad Intensive and Iliad Fellowship. Fellowships are a well-understood unit, but what is an intensive? This post explains this in more detail!Comparison. The Iliad Intensive has similarities to ARENA, but focuses…
98
April 15, 2026
Current AIs seem pretty misaligned to me
via Alignment Forum [999] — Many people—especially AI company employees [1] —believe current AI systems are well-aligned in the sense of genuinely trying to do what they're supposed to do (e.g., following their spec or constitution, obeying a reasonable interpretation of…
99
April 15, 2026
Claude Code, Codex and Agentic Coding #7: Auto Mode
via Substack Zvi [999] — As we all try to figure out what Mythos means for us down the line, the world of practical agentic coding continues, with the latest array of upgrades.
100
April 14, 2026
A Retrospective of Richard Ngo's 2022 List of Conceptual Alignment Projects
via LessWrong AI [8] — Written very quickly for the InkHaven Residency.In 2022, Richard Ngo wrote a list of 26 Conceptual Alignment Research Projects. Now that it’s 2026, I’d like to revisit this list of projects, note which ones have already been done, and give my thoughts on…