Analysis - pDoom (Page 7)

Zac Boring 2 months ago Analysis

Developmental Cognitive Interpretability: A Research Agenda for Modelling Generalisation and Predicting Agent Behaviour

via LessWrong AI [3] — SummarySafe deployment of an AI system requires that we can make confident claims about its behaviour on out-of-distribution deployment inputs on the basis of only pre-deployment evaluations. One approach to making such claims is to take a cognitive…

Zac Boring 2 months ago Analysis

How can the middle powers avoid getting trounced during the intelligence explosion? A plan.

via LessWrong AI [4] — This is an edited version of a LW shortform.Superintelligence will likely be developed by US companies; run on US data centres; and be under the jurisdiction of the US government. This will massively boost US military power and make the US economically…

Zac Boring 2 months ago Analysis

Trees are mostly made of air and a generalizable lesson for AI safety

via LessWrong AI [5] — At the risk of embarrassing myself, I’ll share a confession.For context, I took five years of Latin: four in high school and one in college. In addition to learning the language, all my Latin classes taught a lot about Roman history. Emperors, internal…

Zac Boring 2 months ago Analysis

AI #170: Lack of Executive Order

via Substack Zvi [999] — Last week ended on a cliffhanger of sorts.

Zac Boring 2 months ago Analysis

LLMs Through the Eyes of Vinge

via LessWrong AI [5] — For the last few months, I’ve been re-reading some of my favorite novels. Recently, I went through Vinge’s Zones of Thought series: A Fire Upon the Deep, A Deepness in the Sky, and The Children of the Sky. And what struck me reading them is how much Vinge…

Zac Boring 2 months ago Analysis

Announcing Geodesic Research

via LessWrong AI [6] — We're a Cambridge, UK-based AI safety organisation that’s asking: how can we build the most robust alignment initialisations for capable LLMs?We’re one of the few non-profit organisations positioned to answer this question empirically. We have the…

Zac Boring 2 months ago Analysis

Quantitative AI risk assessment: a starting point

via LessWrong AI [4] — Current AI risk management relies on qualitative approaches, much like nuclear safety before 1975. We propose a shift to quantitative risk modeling, following the approach that transformed nuclear safety. We propose a methodology and demonstrate it by…

Zac Boring 2 months ago Analysis

RTMH: Pope Leo's Magnifica Humanitas on AI

via Substack Zvi [999] — His holiness has spoken, frequently about AI.

Zac Boring 2 months ago Analysis

Cognitive Security as an AI Safety Cause Area

via LessWrong AI [5] — As AI systems become more capable, the cognitive security of humans will be increasingly at risk. By cognitive security, I mean the ability of humans to maintain control over their beliefs and actions.Cognitive security could be compromised in several…

Zac Boring 2 months ago Analysis

Linkpost: New Vatican Encyclical on AI Governance

via LessWrong AI [9] — Pope Leo XIV has released a new, 42k-word encyclical laying out the Vatican's position on many AI safety topics. You can read the full thing here, or read the Vatican's press release here, or coverage in the NY Times, or perhaps consider having an LLM read…

Zac Boring 2 months ago Analysis

We made a map of the doom debate

via LessWrong AI [5] — This was produced as a part of the AI Safety Camp 2026 "Assumptions of the Doom Debate" project, led by Sean Herrington, who was also the lead author on this post. The other participants have equal contributions and are listed in no particular order. It is…

Zac Boring 2 months ago Analysis

Will we really put data centers in space?

via LessWrong AI [3] — AbstractSeveral major technology companies have announced plans to operate AI data centers in orbit. Elon Musk recently claimed: “the lowest-cost place to put AI will be space […] within two years, maybe three.” If a meaningful fraction of new AI compute…

Zac Boring 2 months ago Analysis

PLA Daily Translation: Reflections on Warfare Brought by AGI

via LessWrong AI [4] — Source“Reflections on Warfare Brought by AGI” (AGI带来的战争思考)Source: PLA Daily (解放军报)Date: January 21, 2025Authors: Rong Ming (荣明), Hu Xiaofeng (胡晓峰)IntroductionPlease feel free to skip to the translation, about halfway down, though I would recommend reading…

Zac Boring 2 months ago Analysis

Out-of-Context Reasoning (OOCR) in LLMs: A Short Primer and Reading List

via LessWrong AI [6] — Out-of-context reasoning (OOCR) is a concept relevant to LLM generalization and AI alignment. Also available as a PDF. Contents What is OOCR? Examples Papers Videos What is out-of-context reasoning for LLMs? It's when an LLM reaches a conclusion that…

Zac Boring 2 months ago Analysis

Learned Chain-of-Thought Obfuscation Generalises to Unseen Tasks

via LessWrong AI [3] — TL;DRTraining against a CoT or summary-only monitor can lead to obfuscation of dangerous reasoning in unseen tasks. This strengthens the “don’t train against a monitor” claims.Figure 1. A Two prior results: penalising the CoT or final response produces…

Zac Boring 2 months ago Analysis

Gemini 3.5 Flash Looks Good For How Fast It Is

via Substack Zvi [999] — Google once again has a model worth at least some consideration.

Zac Boring 2 months ago Analysis

Do AI Risks Require Extraordinary Government Intervention?

via AI Snake Oil [7] — Let’s not skip the hard work of AI governance

Zac Boring 2 months ago Analysis

AI #169: New Knowledge

via Substack Zvi [999] — Even in a relatively quiet period, AI is out there creating new knowledge.

Zac Boring 2 months ago Analysis

Childhood And Education #19: Letting Kids Be Kids #2

via Substack Zvi [999] — I cannot emphasize enough the need to let kids be kids.

Zac Boring 2 months ago Analysis

Thoughts on interviewing candidates for AI safety fellowships

via LessWrong AI [5] — Around July last year I decided I was going to go all in on technical AI safety research. To do that I’d need to get into an AI safety fellowship, quit my job, and sell everything that was in my flat in South Africa (hopefully in that order).I applied to…