Machine Learning Evaluation - by Ben Recht - arg min - discu.eu

Hacker News

Machine Learning Evaluation https://www.argmin.net/p/machine-learning-evaluation-631 0 comments 24/4/2025

Linked pages

[2410.05229] GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large Language Models https://arxiv.org/abs/2410.05229 267 comments
[2304.15004] Are Emergent Abilities of Large Language Models a Mirage? https://arxiv.org/abs/2304.15004 130 comments
[2108.13264] Deep Reinforcement Learning at the Edge of the Statistical Precipice https://arxiv.org/abs/2108.13264 17 comments
Prediction Games - by Ben Recht - arg min https://www.argmin.net/p/prediction-games 16 comments
[2104.02145] What Will it Take to Fix Benchmarking in Natural Language Understanding? https://arxiv.org/abs/2104.02145 12 comments
[2107.03374] Evaluating Large Language Models Trained on Code https://arxiv.org/abs/2107.03374 8 comments
[1803.07055] Simple random search provides a competitive approach to reinforcement learning https://arxiv.org/abs/1803.07055 4 comments
[2005.04118] Beyond Accuracy: Behavioral Testing of NLP models with CheckList https://arxiv.org/abs/2005.04118 4 comments
[1709.06560] Deep Reinforcement Learning that Matters https://arxiv.org/abs/1709.06560 3 comments
Some Studies in Machine Learning Using the Game of Checkers | IBM Journals & Magazine | IEEE Xplore https://ieeexplore.ieee.org/document/5392560 1 comment
Patterns, Predictions, and Actions https://mlstory.org/ 0 comments
[2206.04615] Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models https://arxiv.org/abs/2206.04615 0 comments
[1606.08514] Towards Verified Artificial Intelligence https://arxiv.org/abs/1606.08514 0 comments
[2311.09188] Towards Verifiable Text Generation with Symbolic References https://arxiv.org/abs/2311.09188 0 comments
[2411.00640] Adding Error Bars to Evals: A Statistical Approach to Language Model Evaluations https://arxiv.org/abs/2411.00640 0 comments
Thou Shalt Not Overfit - by Ben Recht - arg min https://www.argmin.net/p/thou-shalt-not-overfit 0 comments
Flavors of overfitting - by Ben Recht - arg min https://www.argmin.net/p/flavors-of-overfitting 0 comments
Bit Prediction - by Ben Recht - arg min https://www.argmin.net/p/bit-prediction 0 comments
Overfitting to theories of overfitting - by Ben Recht https://www.argmin.net/p/overfitting-to-theories-of-overfitting 0 comments

Related searches:

Search whole site: site:www.argmin.net

Search title: Machine Learning Evaluation - by Ben Recht - arg min

See how to search.

Submit link to: