- [R] The unsolved mystery at the heard of the "How to Catch an AI Liar: Lie Detection in Black-Box LLMs by Asking Unrelated Questions" paper https://arxiv.org/abs/2309.15840 14 comments machinelearning
Linking pages
- The Road To Honest AI - by Scott Alexander https://www.astralcodexten.com/p/the-road-to-honest-ai 0 comments
- GitHub - elicit/machine-learning-list https://github.com/elicit/machine-learning-list 0 comments
- Simple probes can catch sleeper agents \ Anthropic https://www.anthropic.com/research/probes-catch-sleeper-agents 0 comments
Would you like to stay up to date with Computer science? Checkout Computer science
Weekly.
Related searches:
Search whole site: site:arxiv.org
Search title: [2309.15840] How to Catch an AI Liar: Lie Detection in Black-Box LLMs by Asking Unrelated Questions
See how to search.