- Understanding Multimodal LLMs: The Main Techniques and Latest Models https://sebastianraschka.com/blog/2024/understanding-multimodal-llms.html 5 comments learnmachinelearning
- [P] Understanding Multimodal LLMs: The Main Techniques and Latest Models https://sebastianraschka.com/blog/2024/understanding-multimodal-llms.html 8 comments machinelearning
Linked pages
- Mistral NeMo | Mistral AI | Frontier AI in your hands https://mistral.ai/news/mistral-nemo/ 162 comments
- [1706.03762] Attention Is All You Need https://arxiv.org/abs/1706.03762 145 comments
- Fuyu-8B: A Multimodal Architecture for AI Agents https://www.adept.ai/blog/fuyu-8b 57 comments
- Announcing Pixtral 12B | Mistral AI | Frontier AI in your hands https://mistral.ai/news/pixtral-12b/ 25 comments
- [2410.05993] Aria: An Open Multimodal Native Mixture-of-Experts Model https://arxiv.org/abs/2410.05993 21 comments
- LLMs-from-scratch/ch07/04_preference-tuning-with-dpo/dpo-from-scratch.ipynb at main · rasbt/LLMs-from-scratch · GitHub https://github.com/rasbt/LLMs-from-scratch/blob/main/ch07/04_preference-tuning-with-dpo/dpo-from-scratch.ipynb 5 comments
- http://d 2 comments
- GitHub - mlfoundations/open_clip: An open source implementation of CLIP. https://github.com/mlfoundations/open_clip 0 comments
- GitHub - openai/CLIP: Contrastive Language-Image Pretraining https://github.com/openai/CLIP 0 comments
- [2010.11929] An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale https://arxiv.org/abs/2010.11929 0 comments
- [2401.02954] DeepSeek LLM: Scaling Open-Source Language Models with Longtermism https://arxiv.org/abs/2401.02954 0 comments
- [2303.15343] Sigmoid Loss for Language Image Pre-Training https://arxiv.org/abs/2303.15343 0 comments
- [2406.06525] Autoregressive Model Beats Diffusion: Llama for Scalable Image Generation https://arxiv.org/abs/2406.06525 0 comments
- [2407.21783] The Llama 3 Herd of Models https://arxiv.org/abs/2407.21783 0 comments
- [2409.11402] NVLM: Open Frontier-Class Multimodal LLMs https://arxiv.org/abs/2409.11402 0 comments
- [2409.20566] MM1.5: Methods, Analysis & Insights from Multimodal LLM Fine-tuning https://arxiv.org/abs/2409.20566 0 comments
- [2410.13848] Janus: Decoupling Visual Encoding for Unified Multimodal Understanding and Generation https://arxiv.org/abs/2410.13848 0 comments
Would you like to stay up to date with Computer science? Checkout Computer science
Weekly.
Related searches:
Search whole site: site:sebastianraschka.com
Search title: Understanding Multimodal LLMs
See how to search.