- [R] RWKV-7: attention-free and surpassing strong Modded-GPT baseline (the one with Muon optimizer), while only using headsz 64 https://github.com/BlinkDL/modded-nanogpt-rwkv 18 comments machinelearning
Linked pages
- GitHub - karpathy/llm.c: LLM training in simple, raw C/CUDA https://github.com/karpathy/llm.c 170 comments
- Reproducing GPT-2 (124M) in llm.c in 90 minutes for $20 · karpathy/llm.c · Discussion #481 · GitHub https://github.com/karpathy/llm.c/discussions/481 117 comments
- Let's reproduce GPT-2 (1.6B): one 8XH100 node, 24 hours, $672, in llm.c · karpathy/llm.c · Discussion #677 · GitHub https://github.com/karpathy/llm.c/discussions/677 63 comments
- GitHub - KellerJordan/cifar10-airbench: 94% on CIFAR-10 in 3.29 seconds 💨 https://github.com/KellerJordan/cifar10-airbench 1 comment
- Too Much Information - by Ben Recht - arg min https://www.argmin.net/p/too-much-information 0 comments
Would you like to stay up to date with Computer science? Checkout Computer science
Weekly.
Related searches:
Search whole site: site:github.com
Search title: GitHub - BlinkDL/modded-nanogpt-rwkv: RWKV-7: Surpassing GPT
See how to search.