GitHub - openai/evals - discu.eu

Hacker News

Evals: a framework for evaluating OpenAI models and a registry of benchmarks https://github.com/openai/evals 16 comments 14/3/2023

Linking pages

The Multi-modal, Multi-model, Multi-everything Future of AGI https://lspace.swyx.io/p/multimodal-gpt4 167 comments
GitHub - uptrain-ai/uptrain: UpTrain is an open-source unified platform to evaluate and improve Generative AI applications. We provide grades for 20+ preconfigured checks (covering language, code, embedding use-cases), perform root cause analysis on failure cases and give insights on how to resolve them. https://github.com/uptrain-ai/uptrain 18 comments
Codex (and GPT-4) can’t beat humans on smart contract audits | Trail of Bits Blog https://blog.trailofbits.com/2023/03/22/codex-and-gpt4-cant-beat-humans-on-smart-contract-audits/ 10 comments
GitHub - taishi-i/awesome-ChatGPT-repositories: A curated list of resources dedicated to open source GitHub repositories related to ChatGPT https://github.com/taishi-i/awesome-ChatGPT-repositories 5 comments
Notes on how to use LLMs in your product. | Irrational Exuberance https://lethain.com/mental-model-for-how-to-use-llms-in-products/ 5 comments
LLM Psychometrics: A Speculative Approach to AI Safety https://pascal.cc/blog/artificial-psychometrics 3 comments
Spreading your wings with the ChatGPT API https://blog.echosystems.io/p/spreading-your-wings-with-the-chatgpt 1 comment
GitHub - pulzeai-oss/evals: Evals is an open-source evaluation framework designed to benchmark and assess the performance of AI models on the Pulze AI platform. https://github.com/pulzeai-oss/evals 1 comment
GitHub - opendilab/awesome-RLHF: A curated list of reinforcement learning with human feedback resources (continually updated) https://github.com/opendilab/awesome-RLHF 0 comments
AI Safety: A Technical & Ethnographic Overview https://www.jonstokes.com/p/ai-safety-a-technical-and-ethnographic 0 comments
GitHub - Hannibal046/Awesome-LLM: Awesome-LLM: a curated list of Large Language Model https://github.com/Hannibal046/Awesome-LLM 0 comments
Evaluating the quality of my retrieval-augmented generation system https://technicalwriting.tools/posts/evals/ 0 comments
Chasing the Numbers: The Puzzle of AI Benchmarks https://evalovernite.substack.com/p/ai-benchmarks-puzzle 0 comments
[AINews] Claude 3 is officially America's Next Top Model • Buttondown https://buttondown.email/ainews/archive/ainews-claude-3-is-officially-americas-next-top/ 0 comments
GitHub - openai/simple-evals https://github.com/openai/simple-evals 0 comments
Evaluating LLM Benchmarks for React | KiloBytes by KB https://kshitij-banerjee.github.io/2024/05/04/evaluating-llm-benchmarks-for-react/ 0 comments
GitHub - alopatenko/LLMEvaluation: A comprehensive guide to LLM evaluation methods designed to assist in identifying the most suitable evaluation techniques for various use cases, promote the adoption of best practices in LLM assessment, and critically assess the effectiveness of these evaluation methods. https://github.com/alopatenko/LLMEvaluation 0 comments
GitHub - braintrustdata/autoevals: AutoEvals is a tool for quickly and easily evaluating AI model outputs using best practices. https://github.com/braintrustdata/autoevals 0 comments
Designing for Intelligence: Rethinking APIs, Databases, and Interfaces | Tech Notes from Steven https://stevenyue.com/blogs/designing-for-intelligence-rethinking-apis-databases-and-interfaces/ 0 comments

Linked pages

Related searches:

Search whole site: site:github.com

Search title: GitHub - openai/evals

See how to search.

Submit link to: