Linking pages
Linked pages
- GitHub: Let’s build from here · GitHub https://github.com 3047 comments
- H200 Tensor Core GPU | NVIDIA https://www.nvidia.com/en-gb/data-center/h200/ 125 comments
- Neural networks and deep learning http://neuralnetworksanddeeplearning.com/chap2.html#the_four_fundamental_equations_behind_backpropagation 64 comments
- Let's build the GPT Tokenizer - YouTube https://www.youtube.com/watch?v=zduSFxRajkE 51 comments
- Rotation matrix - Wikipedia https://en.wikipedia.org/wiki/Rotation_matrix#Rotation_matrix_from_axis_and_angle 17 comments
- Neural networks and deep learning http://neuralnetworksanddeeplearning.com/chap5.html 12 comments
- Linear — PyTorch 1.13 documentation https://pytorch.org/docs/stable/generated/torch.nn.Linear.html#torch.nn.Linear 8 comments
- [1512.03385] Deep Residual Learning for Image Recognition http://arxiv.org/abs/1512.03385 6 comments
- Standard score - Wikipedia https://en.wikipedia.org/wiki/Standard_score#/media/File:Normal_distribution_and_scales.gif 6 comments
- CrossEntropyLoss — PyTorch 2.4 documentation https://pytorch.org/docs/stable/generated/torch.nn.CrossEntropyLoss.html 4 comments
- [2001.04451] Reformer: The Efficient Transformer https://arxiv.org/abs/2001.04451 0 comments
- Paper Summary #8 - FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness | Shreyansh Singh https://shreyansh26.github.io/post/2023-03-26_flash-attention/ 0 comments
- mosaicml/mpt-7b-storywriter · Hugging Face https://huggingface.co/mosaicml/mpt-7b-storywriter 0 comments
- LLaMA-2 from the Ground Up - by Cameron R. Wolfe, Ph.D. https://cameronrwolfe.substack.com/p/llama-2-from-the-ground-up 0 comments
- Flash-Decoding for long-context inference | PyTorch https://pytorch.org/blog/flash-decoding/ 0 comments
- https://storage.googleapis.com/deepmind-media/gemini/gemini_1_report.pdf 0 comments
- Model merging lessons in The Waifu Research Department https://www.interconnects.ai/p/model-merging 0 comments
- Dolma, OLMo, and the Future of Open-Source LLMs https://cameronrwolfe.substack.com/p/dolma-olmo-and-the-future-of-open 0 comments
- Maxime Labonne - Decoding Strategies in Large Language Models https://mlabonne.github.io/blog/posts/2023-06-07-Decoding_strategies.html 0 comments
Related searches:
Search whole site: site:cameronrwolfe.substack.com
Search title: Decoder-Only Transformers: The Workhorse of Generative LLMs
See how to search.