Hacker News
- An Intuitive Explanation of Sparse Autoencoders for LLM Interpretability https://adamkarvonen.github.io/machine_learning/2024/06/11/sae-intuitions.html 3 comments
Linked pages
- [2005.14165] Language Models are Few-Shot Learners https://arxiv.org/abs/2005.14165 201 comments
- Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnet https://transformer-circuits.pub/2024/scaling-monosemanticity/ 135 comments
- Golden Gate Claude \ Anthropic https://www.anthropic.com/news/golden-gate-claude 66 comments
- Towards Monosemanticity: Decomposing Language Models With Dictionary Learning https://transformer-circuits.pub/2023/monosemantic-features/index.html 5 comments
- Toy Models of Superposition https://transformer-circuits.pub/2022/toy_model/index.html 4 comments
- Dario Amodei (Anthropic CEO) - $10 Billion Models, OpenAI, Scaling, & AGI in 2 years https://www.dwarkeshpatel.com/p/dario-amodei 0 comments
- https://medium.com/@NoamShazeer/shape-suffixes-good-coding-style-f836e72e24fd 0 comments
- [2404.16014] Improving Dictionary Learning with Gated Sparse Autoencoders https://arxiv.org/abs/2404.16014 0 comments
- https://cdn.openai.com/papers/sparse-autoencoders.pdf 0 comments
Related searches:
Search whole site: site:adamkarvonen.github.io
Search title: An Intuitive Explanation of Sparse Autoencoders for LLM Interpretability | Adam Karvonen
See how to search.