site:blog.oxen.ai - discu.eu

Hacker News

Guide to the Mamba architecture that claims to be a replacement for Transformers https://blog.oxen.ai/mamba-linear-time-sequence-modeling-with-selective-state-spaces-arxiv-dives/ 2 comments 15/12/2023

Reddit

[R] Experiments fine-tuning Mamba 130m on the SQuAD Question Answering dataset https://blog.oxen.ai/practical-ml-dive-how-to-train-mamba-for-question-answering/ 6 comments 21/12/2023 machinelearning
Group [D]iscussion on OpenAI's foundational CLIP Paper for Zero-Shot Image Classification https://blog.oxen.ai/arxiv-dives-zero-shot-image-classification-with-clip/ 3 comments 8/12/2023 machinelearning
[D]eep Dive into the Vision Transformer (ViT) paper by the Google Brain team https://blog.oxen.ai/arxiv-dives-vision-transformers-vit/ 18 comments 1/12/2023 machinelearning
Two Part Research Club on "Mechanistic Interpretability" of LLMs https://blog.oxen.ai/arxiv-dives-a-mathematical-framework-for-transformer-circuits-part-two/ 4 comments 21/11/2023 learnmachinelearning
Arxiv Dives - Attention Is All You Need - How Transformers Work https://blog.oxen.ai/arxiv-dives-attention-is-all-you-need/ 8 comments 5/11/2023 learnmachinelearning
A[r]xiv Dives - Fine-tuning with LoRA paper deep dive https://blog.oxen.ai/arxiv-dives-how-lora-fine-tuning-works/ 8 comments 28/10/2023 machinelearning