Toy Models of Superposition - discu.eu

Hacker News

Toy Models of Superposition (2022) https://transformer-circuits.pub/2022/toy_model/index.html 4 comments 21/8/2023

Linking pages

God Help Us, Let's Try To Understand The Paper On AI Monosemanticity https://www.astralcodexten.com/p/god-help-us-lets-try-to-understand 204 comments
Monosemanticity at Home: My Attempt at Replicating Anthropic's Interpretability Research from Scratch https://jakeward.substack.com/p/monosemanticity-at-home-my-attempt 31 comments
On the Structure of Neural Embeddings https://seanpedersen.github.io/posts/structure-of-neural-latent-space 24 comments
Anthropic | Core Views on AI Safety: When, Why, What, and How https://www.anthropic.com/index/core-views-on-ai-safety 21 comments
Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind https://www.dwarkeshpatel.com/p/sholto-douglas-trenton-bricken 3 comments
An Intuitive Explanation of Sparse Autoencoders for LLM Interpretability | Adam Karvonen https://adamkarvonen.github.io/machine_learning/2024/06/11/sae-intuitions.html 3 comments
GitHub - JShollaj/awesome-llm-interpretability: A curated list of Large Language Model (LLM) Interpretability resources. https://github.com/JShollaj/awesome-llm-interpretability 1 comment
Prism: mapping interpretable concepts and features in a latent space of language | thesephist.com https://thesephist.com/posts/prism/ 1 comment
A primer on sparse autoencoders - by Nick Jiang https://nickjiang.substack.com/p/a-primer-on-sparse-autoencoders 1 comment
Dario Amodei âÂ The Urgency of Interpretability https://www.darioamodei.com/post/the-urgency-of-interpretability 1 comment
Anthropic | Distributed Representations: Composition & Superposition https://www.anthropic.com/index/distributed-representations-composition-superposition 0 comments
Dictionary Learning with Sparse AutoEncoders | Kola Ayonrinde https://www.kolaayonrinde.com/blog/2023/11/03/dictionary-learning.html 0 comments
AI #62: Too Soon to Tell - by Zvi Mowshowitz https://thezvi.substack.com/p/ai-62-too-soon-to-tell 0 comments
I am the Golden Gate Bridge - by Zvi Mowshowitz https://thezvi.substack.com/p/i-am-the-golden-gate-bridge 0 comments
The engineering challenges of scaling interpretability \ Anthropic https://www.anthropic.com/research/engineering-challenges-interpretability 0 comments
"Mechanistic interpretability" for LLMs, explained https://seantrott.substack.com/p/mechanistic-interpretability-for 0 comments
GitHub - google-deepmind/tracr https://github.com/google-deepmind/tracr 0 comments

Related searches:

Search whole site: site:transformer-circuits.pub

Search title: Toy Models of Superposition

See how to search.

Submit link to: