Hacker News
Linked pages
- [2410.05229] GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large Language Models https://arxiv.org/abs/2410.05229 267 comments
- [2304.15004] Are Emergent Abilities of Large Language Models a Mirage? https://arxiv.org/abs/2304.15004 130 comments
- [2108.13264] Deep Reinforcement Learning at the Edge of the Statistical Precipice https://arxiv.org/abs/2108.13264 17 comments
- Prediction Games - by Ben Recht - arg min https://www.argmin.net/p/prediction-games 16 comments
- [2104.02145] What Will it Take to Fix Benchmarking in Natural Language Understanding? https://arxiv.org/abs/2104.02145 12 comments
- [2107.03374] Evaluating Large Language Models Trained on Code https://arxiv.org/abs/2107.03374 8 comments
- [1803.07055] Simple random search provides a competitive approach to reinforcement learning https://arxiv.org/abs/1803.07055 4 comments
- [2005.04118] Beyond Accuracy: Behavioral Testing of NLP models with CheckList https://arxiv.org/abs/2005.04118 4 comments
- [1709.06560] Deep Reinforcement Learning that Matters https://arxiv.org/abs/1709.06560 3 comments
- Some Studies in Machine Learning Using the Game of Checkers | IBM Journals & Magazine | IEEE Xplore https://ieeexplore.ieee.org/document/5392560 1 comment
- Patterns, Predictions, and Actions https://mlstory.org/ 0 comments
- [2206.04615] Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models https://arxiv.org/abs/2206.04615 0 comments
- [1606.08514] Towards Verified Artificial Intelligence https://arxiv.org/abs/1606.08514 0 comments
- [2311.09188] Towards Verifiable Text Generation with Symbolic References https://arxiv.org/abs/2311.09188 0 comments
- [2411.00640] Adding Error Bars to Evals: A Statistical Approach to Language Model Evaluations https://arxiv.org/abs/2411.00640 0 comments
- Thou Shalt Not Overfit - by Ben Recht - arg min https://www.argmin.net/p/thou-shalt-not-overfit 0 comments
- Flavors of overfitting - by Ben Recht - arg min https://www.argmin.net/p/flavors-of-overfitting 0 comments
- Bit Prediction - by Ben Recht - arg min https://www.argmin.net/p/bit-prediction 0 comments
- Overfitting to theories of overfitting - by Ben Recht https://www.argmin.net/p/overfitting-to-theories-of-overfitting 0 comments
Related searches:
Search whole site: site:www.argmin.net
Search title: Machine Learning Evaluation - by Ben Recht - arg min
See how to search.