Linking pages
- LLMs Know More Than What They Say - by Ruby Pai https://arjunbansal.substack.com/p/llms-know-more-than-what-they-say 18 comments
- What is AI interpretability? Artificial intelligence researchers are reverse-engineering ChatGPT, Claude, and Gemini. - Vox https://www.vox.com/future-perfect/362759/ai-interpretability-openai-claude-gemini-neuroscience 7 comments
- Sam Altman Admits That OpenAI Doesn't Actually Understand How Its AI Works https://futurism.com/sam-altman-admits-openai-understand-ai 5 comments
- Non-Obvious Prompt Engineering Guide - by Adam Gospodarczyk https://www.techsistence.com/p/non-obvious-prompt-engineering-guide 1 comment
- Links for May 2024 - by Scott Alexander - Astral Codex Ten https://www.astralcodexten.com/p/links-for-may-2024 0 comments
Linked pages
- Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnet https://transformer-circuits.pub/2024/scaling-monosemanticity/ 135 comments
- GitHub - openai/transformer-debugger https://github.com/openai/transformer-debugger 120 comments
- [2310.13548] Towards Understanding Sycophancy in Language Models https://arxiv.org/abs/2310.13548 72 comments
- Claude https://claude.ai/ 48 comments
- https://www-cdn.anthropic.com/de8ba9b01c9ab7cbabf5c33b80b7bbc618857627/Model_Card_Claude_3.pdf 3 comments
- [2001.08361] Scaling Laws for Neural Language Models https://arxiv.org/abs/2001.08361 0 comments
- [2310.03693] Fine-tuning Aligned Language Models Compromises Safety, Even When Users Do Not Intend To! https://arxiv.org/abs/2310.03693 0 comments
- [2404.16014] Improving Dictionary Learning with Gated Sparse Autoencoders https://arxiv.org/abs/2404.16014 0 comments
Related searches:
Search whole site: site:www.anthropic.com
Search title: Mapping the Mind of a Large Language Model \ Anthropic
See how to search.