- [R] Scaled up CLIP-like model (~2B) shows 86% Zero-shot on Imagenet https://arxiv.org/abs/2205.01917 14 comments machinelearning
Linking pages
- MaMMUT: A simple vision-encoder text-decoder architecture for multimodal tasks – Google AI Blog https://ai.googleblog.com/2023/05/mammut-simple-vision-encoder-text.html 33 comments
- GitHub - cmhungsteve/Awesome-Transformer-Attention: An ultimately comprehensive paper list of Vision Transformer/Attention, including papers, codes, and related websites https://github.com/cmhungsteve/Awesome-Transformer-Attention 13 comments
- GitHub - aimerou/top-ai-papers: A curated list of the most impressive AI papers https://github.com/aimerou/top-ai-papers 9 comments
- PaLI: Scaling Language-Image Learning in 100+ Languages – Google AI Blog https://ai.googleblog.com/2022/09/pali-scaling-language-image-learning-in.html 0 comments
- Image-Text Pre-training with Contrastive Captioners – Google AI Blog https://ai.googleblog.com/2022/05/image-text-pre-training-with.html 0 comments
- Why 2022 was the most exciting year in computer vision history (so far) | by Jacob Marks | Voxel51 | Dec, 2022 | Medium https://medium.com/voxel51/why-2022-was-the-most-exciting-year-in-computer-vision-history-so-far-7a4ab8693b27 0 comments
Would you like to stay up to date with Computer science? Checkout Computer science
Weekly.
Related searches:
Search whole site: site:arxiv.org
Search title: [2205.01917] CoCa: Contrastive Captioners are Image-Text Foundation Models
See how to search.