Hacker News
- Guide to the Mamba architecture that claims to be a replacement for Transformers https://blog.oxen.ai/mamba-linear-time-sequence-modeling-with-selective-state-spaces-arxiv-dives/ 2 comments
- Deep Dive into the Vision Transformers Paper https://blog.oxen.ai/arxiv-dives-vision-transformers-vit/ 8 comments
- Reading List for Andrej Karpathy's "Intro to Large Language Models" Video https://blog.oxen.ai/reading-list-for-andrej-karpathys-intro-to-large-language-models-video/ 6 comments
- [R] Experiments fine-tuning Mamba 130m on the SQuAD Question Answering dataset https://blog.oxen.ai/practical-ml-dive-how-to-train-mamba-for-question-answering/ 6 comments machinelearning
- Group [D]iscussion on OpenAI's foundational CLIP Paper for Zero-Shot Image Classification https://blog.oxen.ai/arxiv-dives-zero-shot-image-classification-with-clip/ 3 comments machinelearning
- [D]eep Dive into the Vision Transformer (ViT) paper by the Google Brain team https://blog.oxen.ai/arxiv-dives-vision-transformers-vit/ 18 comments machinelearning
- Two Part Research Club on "Mechanistic Interpretability" of LLMs https://blog.oxen.ai/arxiv-dives-a-mathematical-framework-for-transformer-circuits-part-two/ 4 comments learnmachinelearning
- Arxiv Dives - Attention Is All You Need - How Transformers Work https://blog.oxen.ai/arxiv-dives-attention-is-all-you-need/ 8 comments learnmachinelearning
- A[r]xiv Dives - Fine-tuning with LoRA paper deep dive https://blog.oxen.ai/arxiv-dives-how-lora-fine-tuning-works/ 8 comments machinelearning