Linking pages
- GitHub - apple/ml-ferret https://github.com/apple/ml-ferret 428 comments
- GitHub - ishan0102/vimGPT: Browse the web with GPT-4V and Vimium https://github.com/ishan0102/vimGPT 128 comments
- AI and Open Source in 2023 - by Sebastian Raschka, PhD https://magazine.sebastianraschka.com/p/ai-and-open-source-in-2023 67 comments
- GitHub - PKU-YuanGroup/Video-LLaVA: Video-LLaVA: Learning United Visual Representation by Alignment Before Projection https://github.com/PKU-YuanGroup/Video-LLaVA 45 comments
- LLaVA-1.6: Improved reasoning, OCR, and world knowledge | LLaVA https://llava-vl.github.io/blog/2024-01-30-llava-1-6/ 45 comments
- GitHub - ictnlp/LLaMA-Omni: LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve speech capabilities at the GPT-4o level. https://github.com/ictnlp/LLaMA-Omni 41 comments
- GitHub - IDEA-Research/Grounded-Segment-Anything: Grounded SAM: Marrying Grounding DINO with Segment Anything & Stable Diffusion & Recognize Anything - Automatically Detect , Segment and Generate Anything https://github.com/IDEA-Research/Grounded-Segment-Anything 15 comments
- GitHub - potamides/DeTikZify: Synthesizing Graphics Programs for Scientific Figures and Sketches with TikZ https://github.com/potamides/DeTikZify 12 comments
- GitHub - hpcaitech/Open-Sora: Open-Sora: Democratizing Efficient Video Production for All https://github.com/hpcaitech/Open-Sora 8 comments
- GitHub - HenryHZY/Awesome-Multimodal-LLM: Research Trends in LLM-guided Multimodal Learning. https://github.com/HenryHZY/Awesome-Multimodal-LLM 7 comments
- Bridging Images and Text - a Survey of VLMs https://nanonets.com/blog/bridging-images-and-text-a-survey-of-vlms/ 4 comments
- GitHub - Alpha-VLLM/LLaMA2-Accessory: An Open-source Toolkit for LLM Development https://github.com/Alpha-VLLM/LLaMA2-Accessory 3 comments
- Aman's AI Journal • Primers • Overview of Large Language Models https://aman.ai/primers/ai/LLM/ 1 comment
- GitHub - oscinis-com/Awesome-LLM-Productization: Awesome-LLM-Productization: a curated list of tools/tricks/news/regulations about AI and Large Language Model (LLM) productization https://github.com/oscinis-com/Awesome-LLM-Productization 1 comment
- I've picked the top GitHub repos for you https://hackerpulse.substack.com/p/ive-picked-the-top-github-repos-for 1 comment
- GitHub - THUDM/CogVLM: a state-of-the-art-level open visual language model | 多模态预训练模型 https://github.com/THUDM/CogVLM 1 comment
- GitHub - SkalskiP/awesome-foundation-and-multimodal-models: 👁️ + 💬 + 🎧 = 🤖 Curated list of top foundation and multimodal models! [Paper + Code + Examples + Tutorials] https://github.com/SkalskiP/awesome-foundation-and-multimodal-models 1 comment
- GitHub - dvlab-research/MiniGemini: Official implementation for Mini-Gemini https://github.com/dvlab-research/MiniGemini 1 comment
- SkyPilot 0.3: LLM support and unprecedented GPU availability across more clouds | SkyPilot Blog https://blog.skypilot.co/announcing-skypilot-0.3/ 0 comments
- GitHub - mit-han-lab/llm-awq: AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration https://github.com/mit-han-lab/llm-awq 0 comments
Linked pages
- LLaVA https://llava-vl.github.io/ 54 comments
- GitHub - IDEA-Research/Grounded-Segment-Anything: Grounded SAM: Marrying Grounding DINO with Segment Anything & Stable Diffusion & Recognize Anything - Automatically Detect , Segment and Generate Anything https://github.com/IDEA-Research/Grounded-Segment-Anything 15 comments
- GitHub - IDEA-Research/GroundingDINO: The official implementation of "Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection" https://github.com/IDEA-Research/GroundingDINO 6 comments
- GitHub - lm-sys/FastChat: An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena. https://github.com/lm-sys/FastChat 4 comments
- [2304.08485] Visual Instruction Tuning https://arxiv.org/abs/2304.08485 1 comment
- GLIGEN:Open-Set Grounded Text-to-Image Generation. https://gligen.github.io/ 0 comments
- GitHub - UX-Decoder/Segment-Everything-Everywhere-All-At-Once https://github.com/UX-Decoder/Segment-Everything-Everywhere-All-At-Once 0 comments
- GitHub - facebookresearch/segment-anything: The repository provides code for running inference with the SegmentAnything Model (SAM), links for downloading the trained model checkpoints, and example notebooks that show how to use the model. https://github.com/facebookresearch/segment-anything 0 comments
- GitHub - microsoft/LLaVA-Med: Large Language-and-Vision Assistant for BioMedicine, built towards multimodal GPT-4 level capabilities. https://github.com/microsoft/LLaVA-Med 0 comments
- https://llava.hliu.cc 0 comments