Hacker News
- MaMMUT: A simple vision-encoder text-decoder architecture for multimodal tasks https://ai.googleblog.com/2023/05/mammut-simple-vision-encoder-text.html 33 comments
Linking pages
Linked pages
- Pathways Language Model (PaLM): Scaling to 540 Billion Parameters for Breakthrough Performance – Google AI Blog https://ai.googleblog.com/2022/04/pathways-language-model-palm-scaling-to.html 279 comments
- [2005.14165] Language Models are Few-Shot Learners https://arxiv.org/abs/2005.14165 201 comments
- [1706.03762] Attention Is All You Need https://arxiv.org/abs/1706.03762 145 comments
- https://storage.googleapis.com/deepmind-media/DeepMind.com/Blog/tackling-multiple-tasks-with-a-single-visual-language-model/flamingo.pdf 20 comments
- [2002.05709] A Simple Framework for Contrastive Learning of Visual Representations https://arxiv.org/abs/2002.05709 18 comments
- [2205.01917] CoCa: Contrastive Captioners are Image-Text Foundation Models https://arxiv.org/abs/2205.01917 14 comments
- [1801.10198] Generating Wikipedia by Summarizing Long Sequences https://arxiv.org/abs/1801.10198 0 comments
- More Efficient In-Context Learning with GLaM – Google AI Blog https://ai.googleblog.com/2021/12/more-efficient-in-context-learning-with.html 0 comments
- End-to-end Generative Pre-training for Multimodal Video Captioning – Google AI Blog https://ai.googleblog.com/2022/06/end-to-end-generative-pre-training-for.html 0 comments
- Neural machine translation with a Transformer and Keras | Text | TensorFlow https://www.tensorflow.org/text/tutorials/transformer 0 comments
Related searches:
Search whole site: site:ai.googleblog.com
Search title: MaMMUT: A simple vision-encoder text-decoder architecture for multimodal tasks – Google AI Blog
See how to search.