Linking pages
Linked pages
- Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnet https://transformer-circuits.pub/2024/scaling-monosemanticity/ 135 comments
- [2406.11717] Refusal in Language Models Is Mediated by a Single Direction https://arxiv.org/abs/2406.11717 44 comments
- [2404.14394] A Multimodal Automated Interpretability Agent https://arxiv.org/abs/2404.14394 7 comments
- [2309.11998] LMSYS-Chat-1M: A Large-Scale Real-World LLM Conversation Dataset https://arxiv.org/abs/2309.11998 1 comment
- http://netdissect.csail.mit.edu/final-network-dissection.pdf 0 comments
- Language models can explain neurons in language models https://openaipublic.blob.core.windows.net/neuron-explainer/paper/index.html 0 comments
- [2405.14860] Not All Language Model Features Are Linear https://arxiv.org/abs/2405.14860 0 comments
- [2408.05147] Gemma Scope: Open Sparse Autoencoders Everywhere All At Once on Gemma 2 https://arxiv.org/abs/2408.05147 0 comments
Related searches:
Search whole site: site:transluce.org
Search title: Scaling Automatic Neuron Description | Transluce AI
See how to search.