[2205.01917] CoCa: Contrastive Captioners are Image-Text Foundation Models - discu.eu

Reddit

[R] Scaled up CLIP-like model (~2B) shows 86% Zero-shot on Imagenet https://arxiv.org/abs/2205.01917 14 comments 5/5/2022 machinelearning

Linking pages

MaMMUT: A simple vision-encoder text-decoder architecture for multimodal tasks – Google AI Blog https://ai.googleblog.com/2023/05/mammut-simple-vision-encoder-text.html 33 comments
GitHub - cmhungsteve/Awesome-Transformer-Attention: An ultimately comprehensive paper list of Vision Transformer/Attention, including papers, codes, and related websites https://github.com/cmhungsteve/Awesome-Transformer-Attention 13 comments
GitHub - aimerou/top-ai-papers: A curated list of the most impressive AI papers https://github.com/aimerou/top-ai-papers 9 comments
PaLI: Scaling Language-Image Learning in 100+ Languages – Google AI Blog https://ai.googleblog.com/2022/09/pali-scaling-language-image-learning-in.html 0 comments
Image-Text Pre-training with Contrastive Captioners – Google AI Blog https://ai.googleblog.com/2022/05/image-text-pre-training-with.html 0 comments
Why 2022 was the most exciting year in computer vision history (so far) | by Jacob Marks | Voxel51 | Dec, 2022 | Medium https://medium.com/voxel51/why-2022-was-the-most-exciting-year-in-computer-vision-history-so-far-7a4ab8693b27 0 comments

Would you like to stay up to date with Computer science? Checkout Computer science Weekly.

Related searches:

Search whole site: site:arxiv.org

Search title: [2205.01917] CoCa: Contrastive Captioners are Image-Text Foundation Models

See how to search.

Submit link to: