GitHub - haotian-liu/LLaVA: Visual Instruction Tuning: Large Language-and-Vision Assistant built towards multimodal GPT-4 level capabilities.

Linking pages

GitHub - apple/ml-ferret https://github.com/apple/ml-ferret 428 comments
GitHub - ishan0102/vimGPT: Browse the web with GPT-4V and Vimium https://github.com/ishan0102/vimGPT 128 comments
GitHub - apple/ml-fastvlm: This repository contains the official implementation of "FastVLM: Efficient Vision Encoding for Vision Language Models" - CVPR 2025 https://github.com/apple/ml-fastvlm 73 comments
AI and Open Source in 2023 - by Sebastian Raschka, PhD https://magazine.sebastianraschka.com/p/ai-and-open-source-in-2023 67 comments
GitHub - PKU-YuanGroup/Video-LLaVA: Video-LLaVA: Learning United Visual Representation by Alignment Before Projection https://github.com/PKU-YuanGroup/Video-LLaVA 45 comments
LLaVA-1.6: Improved reasoning, OCR, and world knowledge | LLaVA https://llava-vl.github.io/blog/2024-01-30-llava-1-6/ 45 comments
GitHub - ictnlp/LLaMA-Omni: LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve speech capabilities at the GPT-4o level. https://github.com/ictnlp/LLaMA-Omni 41 comments
GitHub - IDEA-Research/Grounded-Segment-Anything: Grounded SAM: Marrying Grounding DINO with Segment Anything & Stable Diffusion & Recognize Anything - Automatically Detect , Segment and Generate Anything https://github.com/IDEA-Research/Grounded-Segment-Anything 15 comments
GitHub - potamides/DeTikZify: Synthesizing Graphics Programs for Scientific Figures and Sketches with TikZ https://github.com/potamides/DeTikZify 12 comments
GitHub - hpcaitech/Open-Sora: Open-Sora: Democratizing Efficient Video Production for All https://github.com/hpcaitech/Open-Sora 8 comments
Yes, You Can Do AI with C++! https://blog.conan.io/2024/12/23/You-can-do-AI-with-cpp.html 8 comments
GitHub - HenryHZY/Awesome-Multimodal-LLM: Research Trends in LLM-guided Multimodal Learning. https://github.com/HenryHZY/Awesome-Multimodal-LLM 7 comments
Bridging Images and Text - a Survey of VLMs https://nanonets.com/blog/bridging-images-and-text-a-survey-of-vlms/ 4 comments
GitHub - ictnlp/LLaVA-Mini: LLaVA-Mini is a unified large multimodal model (LMM) that can support the understanding of images, high-resolution images, and videos in an efficient manner. https://github.com/ictnlp/LLaVA-Mini 4 comments
GitHub - Alpha-VLLM/LLaMA2-Accessory: An Open-source Toolkit for LLM Development https://github.com/Alpha-VLLM/LLaMA2-Accessory 3 comments
Aman's AI Journal • Primers • Overview of Large Language Models https://aman.ai/primers/ai/LLM/ 1 comment
GitHub - oscinis-com/Awesome-LLM-Productization: Awesome-LLM-Productization: a curated list of tools/tricks/news/regulations about AI and Large Language Model (LLM) productization https://github.com/oscinis-com/Awesome-LLM-Productization 1 comment
I've picked the top GitHub repos for you https://hackerpulse.substack.com/p/ive-picked-the-top-github-repos-for 1 comment
GitHub - THUDM/CogVLM: a state-of-the-art-level open visual language model | 多模态预训练模型 https://github.com/THUDM/CogVLM 1 comment
GitHub - SkalskiP/awesome-foundation-and-multimodal-models: 👁️ + 💬 + 🎧 = 🤖 Curated list of top foundation and multimodal models! [Paper + Code + Examples + Tutorials] https://github.com/SkalskiP/awesome-foundation-and-multimodal-models 1 comment

Linking pages

Linked pages