[2310.01405] Representation Engineering: A Top-Down Approach to AI Transparency

Linking pages

Machine Unlearning in 2024 - Ken Ziyu Liu - Stanford Computer Science https://ai.stanford.edu/~kzliu/blog/unlearning 94 comments
Representation Engineering Mistral-7B an Acid Trip https://vgel.me/posts/representation-engineering/ 75 comments
Manipulating Chess-GPT’s World Model | Adam Karvonen https://adamkarvonen.github.io/machine_learning/2024/03/20/chess-gpt-interventions.html 36 comments
Why The Llama 3.1 Announcement Is Huge - Tim Kellogg https://timkellogg.me/blog/2024/07/23/llama-3.1 23 comments
Yes, AIs ‘understand’ things - by Robert Wright https://nonzero.substack.com/p/yes-ais-understand-things 2 comments
GitHub - JShollaj/awesome-llm-interpretability: A curated list of Large Language Model (LLM) Interpretability resources. https://github.com/JShollaj/awesome-llm-interpretability 1 comment
Representation Engineering and Control Vectors - Neuroscience for LLMs - hlfshell https://hlfshell.ai/posts/representation-engineering/ 1 comment
The Road To Honest AI - by Scott Alexander https://www.astralcodexten.com/p/the-road-to-honest-ai 0 comments
The case for open source AI https://press.airstreet.com/p/the-case-for-open-source-ai 0 comments
GitHub - elicit/machine-learning-list https://github.com/elicit/machine-learning-list 0 comments
AI #62: Too Soon to Tell - by Zvi Mowshowitz https://thezvi.substack.com/p/ai-62-too-soon-to-tell 0 comments
"Mechanistic interpretability" for LLMs, explained https://seantrott.substack.com/p/mechanistic-interpretability-for 0 comments
GitHub - google-gemini/gemma-cookbook: A collection of guides and examples for the Gemma open models from Google. https://github.com/google-gemini/gemma-cookbook 0 comments