Linking pages
- How to Backdoor Large Language Models - by Shrivu Shankar https://blog.sshh.io/p/how-to-backdoor-large-language-models 6 comments
- Mixture-of-Experts (MoE): The Birth and Rise of Conditional Computation https://cameronrwolfe.substack.com/p/conditional-computation-the-birth 0 comments
- Model Merging: A Survey - by Cameron R. Wolfe, Ph.D. https://cameronrwolfe.substack.com/p/model-merging 0 comments
- Scaling Laws for LLMs: From GPT-3 to o3 https://cameronrwolfe.substack.com/p/llm-scaling-laws 0 comments
- Mixture-of-Experts (MoE) LLMs - by Cameron R. Wolfe, Ph.D. https://cameronrwolfe.substack.com/p/moe-llms 0 comments
Linked pages
- GitHub · Build and ship software on a single, collaborative platform · GitHub https://github.com 3053 comments
- H200 Tensor Core GPU | NVIDIA https://www.nvidia.com/en-gb/data-center/h200/ 125 comments
- Neural networks and deep learning http://neuralnetworksanddeeplearning.com/chap2.html#the_four_fundamental_equations_behind_backpropagation 64 comments
- Let's build the GPT Tokenizer - YouTube https://www.youtube.com/watch?v=zduSFxRajkE 51 comments
- Rotation matrix - Wikipedia https://en.wikipedia.org/wiki/Rotation_matrix#Rotation_matrix_from_axis_and_angle 17 comments
- Neural networks and deep learning http://neuralnetworksanddeeplearning.com/chap5.html 12 comments
- Linear — PyTorch 1.13 documentation https://pytorch.org/docs/stable/generated/torch.nn.Linear.html#torch.nn.Linear 8 comments
- [1512.03385] Deep Residual Learning for Image Recognition http://arxiv.org/abs/1512.03385 6 comments
- Standard score - Wikipedia https://en.wikipedia.org/wiki/Standard_score#/media/File:Normal_distribution_and_scales.gif 6 comments
- CrossEntropyLoss — PyTorch 2.4 documentation https://pytorch.org/docs/stable/generated/torch.nn.CrossEntropyLoss.html 4 comments
- [2001.04451] Reformer: The Efficient Transformer https://arxiv.org/abs/2001.04451 0 comments
- Paper Summary #8 - FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness | Shreyansh Singh https://shreyansh26.github.io/post/2023-03-26_flash-attention/ 0 comments
- mosaicml/mpt-7b-storywriter · Hugging Face https://huggingface.co/mosaicml/mpt-7b-storywriter 0 comments
- LLaMA-2 from the Ground Up - by Cameron R. Wolfe, Ph.D. https://cameronrwolfe.substack.com/p/llama-2-from-the-ground-up 0 comments
- Flash-Decoding for long-context inference | PyTorch https://pytorch.org/blog/flash-decoding/ 0 comments
- https://storage.googleapis.com/deepmind-media/gemini/gemini_1_report.pdf 0 comments
- Model merging lessons in The Waifu Research Department https://www.interconnects.ai/p/model-merging 0 comments
- Dolma, OLMo, and the Future of Open-Source LLMs https://cameronrwolfe.substack.com/p/dolma-olmo-and-the-future-of-open 0 comments
- Maxime Labonne - Decoding Strategies in Large Language Models https://mlabonne.github.io/blog/posts/2023-06-07-Decoding_strategies.html 0 comments
Related searches:
Search whole site: site:cameronrwolfe.substack.com
Search title: Decoder-Only Transformers: The Workhorse of Generative LLMs
See how to search.