GitHub - BlinkDL/modded-nanogpt-rwkv: RWKV-7: Surpassing GPT - discu.eu

Reddit

[R] RWKV-7: attention-free and surpassing strong Modded-GPT baseline (the one with Muon optimizer), while only using headsz 64 https://github.com/BlinkDL/modded-nanogpt-rwkv 18 comments 21/10/2024 machinelearning

Linked pages

GitHub - karpathy/llm.c: LLM training in simple, raw C/CUDA https://github.com/karpathy/llm.c 169 comments
Reproducing GPT-2 (124M) in llm.c in 90 minutes for $20 · karpathy/llm.c · Discussion #481 · GitHub https://github.com/karpathy/llm.c/discussions/481 117 comments
Let's reproduce GPT-2 (1.6B): one 8XH100 node, 24 hours, $672, in llm.c · karpathy/llm.c · Discussion #677 · GitHub https://github.com/karpathy/llm.c/discussions/677 63 comments
GitHub - KellerJordan/cifar10-airbench: 94% on CIFAR-10 in 3.29 seconds 💨 https://github.com/KellerJordan/cifar10-airbench 1 comment
Too Much Information - by Ben Recht - arg min https://www.argmin.net/p/too-much-information 0 comments

Would you like to stay up to date with Computer science? Checkout Computer science Weekly.

Related searches:

Search whole site: site:github.com

Search title: GitHub - BlinkDL/modded-nanogpt-rwkv: RWKV-7: Surpassing GPT

See how to search.

Submit link to: