Hacker News
- Evals: a framework for evaluating OpenAI models and a registry of benchmarks https://github.com/openai/evals 16 comments
Linking pages
- The Multi-modal, Multi-model, Multi-everything Future of AGI https://lspace.swyx.io/p/multimodal-gpt4 167 comments
- GitHub - uptrain-ai/uptrain: UpTrain is an open-source unified platform to evaluate and improve Generative AI applications. We provide grades for 20+ preconfigured checks (covering language, code, embedding use-cases), perform root cause analysis on failure cases and give insights on how to resolve them. https://github.com/uptrain-ai/uptrain 18 comments
- Codex (and GPT-4) can’t beat humans on smart contract audits | Trail of Bits Blog https://blog.trailofbits.com/2023/03/22/codex-and-gpt4-cant-beat-humans-on-smart-contract-audits/ 10 comments
- GitHub - taishi-i/awesome-ChatGPT-repositories: A curated list of resources dedicated to open source GitHub repositories related to ChatGPT https://github.com/taishi-i/awesome-ChatGPT-repositories 5 comments
- Notes on how to use LLMs in your product. | Irrational Exuberance https://lethain.com/mental-model-for-how-to-use-llms-in-products/ 5 comments
- LLM Psychometrics: A Speculative Approach to AI Safety https://pascal.cc/blog/artificial-psychometrics 3 comments
- Spreading your wings with the ChatGPT API https://blog.echosystems.io/p/spreading-your-wings-with-the-chatgpt 1 comment
- GitHub - opendilab/awesome-RLHF: A curated list of reinforcement learning with human feedback resources (continually updated) https://github.com/opendilab/awesome-RLHF 0 comments
- AI Safety: A Technical & Ethnographic Overview https://www.jonstokes.com/p/ai-safety-a-technical-and-ethnographic 0 comments
- GitHub - Hannibal046/Awesome-LLM: Awesome-LLM: a curated list of Large Language Model https://github.com/Hannibal046/Awesome-LLM 0 comments
- Evaluating the quality of my retrieval-augmented generation system https://technicalwriting.tools/posts/evals/ 0 comments
- Chasing the Numbers: The Puzzle of AI Benchmarks https://evalovernite.substack.com/p/ai-benchmarks-puzzle 0 comments
- [AINews] Claude 3 is officially America's Next Top Model • Buttondown https://buttondown.email/ainews/archive/ainews-claude-3-is-officially-americas-next-top/ 0 comments
- GitHub - openai/simple-evals https://github.com/openai/simple-evals 0 comments
- Evaluating LLM Benchmarks for React | KiloBytes by KB https://kshitij-banerjee.github.io/2024/05/04/evaluating-llm-benchmarks-for-react/ 0 comments
- GitHub - alopatenko/LLMEvaluation: A comprehensive guide to LLM evaluation methods designed to assist in identifying the most suitable evaluation techniques for various use cases, promote the adoption of best practices in LLM assessment, and critically assess the effectiveness of these evaluation methods. https://github.com/alopatenko/LLMEvaluation 0 comments
- GitHub - braintrustdata/autoevals: AutoEvals is a tool for quickly and easily evaluating AI model outputs using best practices. https://github.com/braintrustdata/autoevals 0 comments
Linked pages
- Pricing https://openai.com/api/pricing/ 136 comments
- Git Large File Storage | Git Large File Storage (LFS) replaces large files such as audio samples, videos, datasets, and graphics with text pointers inside Git, while storing the file contents on a remote server like GitHub.com or GitHub Enterprise. https://git-lfs.com/ 2 comments
- CoQA: A Conversational Question Answering Challenge https://stanfordnlp.github.io/coqa/ 0 comments
Related searches:
Search whole site: site:github.com
Search title: GitHub - openai/evals
See how to search.