Hacker News
- Toy Models of Superposition (2022) https://transformer-circuits.pub/2022/toy_model/index.html 4 comments
Linking pages
- God Help Us, Let's Try To Understand The Paper On AI Monosemanticity https://www.astralcodexten.com/p/god-help-us-lets-try-to-understand 205 comments
- Monosemanticity at Home: My Attempt at Replicating Anthropic's Interpretability Research from Scratch https://jakeward.substack.com/p/monosemanticity-at-home-my-attempt 31 comments
- Anthropic | Core Views on AI Safety: When, Why, What, and How https://www.anthropic.com/index/core-views-on-ai-safety 21 comments
- Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind https://www.dwarkeshpatel.com/p/sholto-douglas-trenton-bricken 3 comments
- GitHub - JShollaj/awesome-llm-interpretability: A curated list of Large Language Model (LLM) Interpretability resources. https://github.com/JShollaj/awesome-llm-interpretability 1 comment
- Prism: mapping interpretable concepts and features in a latent space of language | thesephist.com https://thesephist.com/posts/prism/ 1 comment
- A primer on sparse autoencoders - by Nick Jiang https://nickjiang.substack.com/p/a-primer-on-sparse-autoencoders 1 comment
- Anthropic | Distributed Representations: Composition & Superposition https://www.anthropic.com/index/distributed-representations-composition-superposition 0 comments
- Dictionary Learning with Sparse AutoEncoders | Kola Ayonrinde https://www.kolaayonrinde.com/blog/2023/11/03/dictionary-learning.html 0 comments
- AI #62: Too Soon to Tell - by Zvi Mowshowitz https://thezvi.substack.com/p/ai-62-too-soon-to-tell 0 comments
- I am the Golden Gate Bridge - by Zvi Mowshowitz https://thezvi.substack.com/p/i-am-the-golden-gate-bridge 0 comments
- The engineering challenges of scaling interpretability \ Anthropic https://www.anthropic.com/research/engineering-challenges-interpretability 0 comments
- An Intuitive Explanation of Sparse Autoencoders for Mechanistic Interpretability of LLMs | Adam Karvonen https://adamkarvonen.github.io/machine_learning/2024/06/11/sae-intuitions.html 0 comments
- "Mechanistic interpretability" for LLMs, explained https://seantrott.substack.com/p/mechanistic-interpretability-for 0 comments
Related searches:
Search whole site: site:transformer-circuits.pub
Search title: Toy Models of Superposition
See how to search.