My Posts

Axes of Planning in LLMs + Partial Lit Review

Axes of Planning in LLMs + Partial Lit Review

Epistemic Status: Written over the course of a couple days at Inkhaven. Some of the info is old so some newer papers are excluded.TL;DR: People talk about "interpreting planning and goals" in models all the time, but people have different understandings of what exactly "planning" means. I try to dec...


Thought Experiments on Continuity of Consciousness

I wrote about continuity of consciousness in my cryonics post: Two Theories for Cryopreservation. I already stated I’m kind of unsure about it. But now I spend a little more time thinking about it.Could I make a ladder of thought experiments to get me to believing it’s fake?Is it really something I ...

Linear vs Non-linear Probes for Interpretability

Linear vs Non-linear Probes for Interpretability

Epistemic status: Old news and well-known, but I find it hard to point at a single post that encapsulates my intuitions on this, so I write them down here.One question that comes up sometimes in interpretability work, is: “why do I trust simple linear probes more than complex non-linear ones?”. (Eve...


Is death and suffering axiomatically bad?

After writing about my ethics yesterday, there was some discussion about the axioms that I think are most needed to derive the rest of my ethics.pleasure is goodnobody argues against this.suffering is badSomeone did argue that this is potentially untrue, that there exists “voluntary suffering”. I th...


My Ethics

I’ve been thinking on and off about ethics for around a decade, but mostly in an informal way rather than through formal philosophy. I expect that if I spent more time seriously stress-testing my views, I’d find inconsistencies or conclusions I’m less comfortable with than I currently think. Still, ...


How much faster is speaking, compared to typing on laptop vs phone vs writing?

So as I haven’t been able to speak the past short while, one thing I have noticed is that it is harder to communicate with others. I know what you are thinking: “Wow, who could have possibly guessed? It’s harder to converse when you can’t speak?”. Indeed, I didn’t expect it either.But how much harde...

Two Theories for Cryopreservation

Two Theories for Cryopreservation

Why cryonics, and the two main methods, with practical discussion and philosophical musings on both.Epistemic status: Cryonics is a scientific field that is long established, yet long underfunded, and uncertain. I’ve been thinking about this on and off for a few years and remain cautiously optimisti...


carbon offset arbitrage opportunity

So, you run an airline, you have a record of people that loyally keep coming back to you, you have costs associated with flying, in particular fuel. Your customers worry about the carbon emissions. Is there a something you could do things to lower some associated costs?There is!In the San Francisco ...


Dying with Whimsy

To me it feels pretty emotionally clear we are nearing the end-times with AI. That in 1-4 years[1] things will be radically transformed, that at least one of the big AI labs will become autonomous research organizations working on developing the next version of their AI, perhaps with some narrow gui...

Modelling Trajectories - Interim results

Modelling Trajectories - Interim results

IntroductionNote: These are results which have been in drafts for a year, see discussion about how we have moved on to thinking about these things.Our team at AI Safety Camp has been working on a project to model the trajectories of language model outputs. We're interested in predicting not just the...

Energy Markets Temporal Arbitrage with Batteries

Energy Markets Temporal Arbitrage with Batteries

Epistemic Status: I am not an energy expert, and this was done rather briefly. All analysis uses pricing data specific to Ireland, but some general ideas are likely applicable more broadly. Data is true as of March 2025. Where there are uncertainties I try to state them, but there are likely some fa...


Distillation of Meta's Large Concept Models Paper

Note: I had this as a draft for a while. I think it is accurate, but there may be errors. I am not in any way affiliated with the authors of the paper. Below I briefly discuss the "Large Concept Models" paper released by Meta, which tries to change some of the paradigm of doing language modelli...

ParaScopes: Do Language Models Plan the Upcoming Paragraph?

ParaScopes: Do Language Models Plan the Upcoming Paragraph?

This work is a continuation of work in a workshop paper: Extracting Paragraphs from LLM Token Activations, and based on continuous research into my main research agenda: Modelling Trajectories of Language Models. See the GitHub repository for code additional details.Looking at the path directly in f...

Literature Review of Text AutoEncoders

Literature Review of Text AutoEncoders

This is a brief literature review of Text AutoEncoders, as I used them in a recent project and did not find a good resource covering them.TL;DR: There exist models that take some text -> encode it into a single vector -> decode back into approximately the same text. Meta's SONAR models seem to...

Confusing the metric for the meaning: Perhaps correlated attributes are "natural"

Confusing the metric for the meaning: Perhaps correlated attributes are "natural"

Epistemic status: possibly trivial, but I hadn't heard it before.TL;DR: What I thought of as a "flaw" in PCA—its inability to isolate pure metrics—might actually be a feature that aligns with our cognitive processes. We often think in terms of composite concepts (e.g., "Age + correlated attributes")...

Comparing Quantized Performance in Llama Models

Comparing Quantized Performance in Llama Models

Epistemic Status: Quick tests, most of this was done in less than 48 hoursTL;DR: Can you skimp on GPU VRAM?  8bit quantized seems fine, for 4bit it depends. I was asked by @Teun van der Weij, to what degree one can run evaluations on quantized models, and I was unsure. I have run some eval...


AISC 2024 - Project Summaries

Apply to AI Safety Camp 2024 by 1st December 2023. All mistakes here are my own.Below are some summaries for each project proposal, listed in order of how they appear on the website. These are edited by me, and most have not yet been reviewed by the project leads. I think having a list like this mak...

Research Adenda: Modelling Trajectories of Language Models

Research Adenda: Modelling Trajectories of Language Models

Apply to work on this project with me at AI Safety Camp 2024 before 1st December 2023.What are the possible outcomes?SummaryRather than asking “What next token will the Language Model Predict?” or “What next action will an RL agent take?”, I think it is important to be able to model the longer-term ...

Machine Unlearning Evaluations as Interpretability Benchmarks

Machine Unlearning Evaluations as Interpretability Benchmarks

Interpreting Models by Ablation. Image generated by DALL-E 3.IntroductionInterpretability in machine learning, especially in language models, is an area with a large number of contributions. While this can be quite useful for improving our understanding of models, one issue is that there is the lack...

Ideation and Trajectory Modelling in Language Models

Ideation and Trajectory Modelling in Language Models

[Epistemic Status: Exploratory, and I may have confusions]IntroductionLLMs and other possible RL agent have the property of taking many actions iteratively. However, not all possible short-term outputs are equally likely, and I think better modelling what these possible outcomes might look like coul...

LLM Modularity: The Separability of Capabilities in Large Language Models

LLM Modularity: The Separability of Capabilities in Large Language Models

Separating out different capabilities.Post format: First, a 30-second TL;DR, next a 5-minute summary, and finally the full ~40-minute full length technical report.Special thanks to Lucius Bushnaq for inspiring this work with his work on modularity.TL;DROne important aspect of Modularity, is that the...

LLM Basics: Embedding Spaces - Transformer Token Vectors Are Not Points in Space

LLM Basics: Embedding Spaces - Transformer Token Vectors Are Not Points in Space

This post is written as an explanation of a misconception I had with transformer embedding when I was getting started. Thanks to Stephen Fowler for the discussion last August that made me realise the misconception, and others for helping me refine my explanation. Any mistakes are my own. Thanks to f...

Speculation on Path-Dependance in Large Language Models.

Speculation on Path-Dependance in Large Language Models.

Epistemic Status: Highly Speculative. I spent less than a day thinking about this in particular, and though I have spent a few months studying large language models, I have never trained a language model. I am likely wrong about many things. I have not seen research on this, so it may be useful for ...

Searching for Modularity in Large Language Models

Searching for Modularity in Large Language Models

Produced As Part Of The SERI ML Alignment Theory Scholars Program 2022 Under John WentworthSee the Google Colab notebook to see the technical details of analysis that was donePrevious posts on Modularity investigated how one should search for and try to define modularity. Looking at biological ...

What Makes an Idea Understandable? On Architecturally and Culturally Natural Ideas.

What Makes an Idea Understandable? On Architecturally and Culturally Natural Ideas.

Midjourney generating a HD image of "a medium-length sleeve t-shirt". It in fact looks like a t-shirt that has both long sleeves and short sleeves.Produced as part of the SERI MATS Program 2022 under John WentworthGeneral IdeaThere are ideas that people can learn more or less easily compared to othe...