Linking pages
- Vid2Seq: a pretrained visual language model for describing multi-event videos – Google AI Blog https://ai.googleblog.com/2023/03/vid2seq-pretrained-visual-language.html 16 comments
- Google Research, 2022 & beyond: Language, vision and generative models – Google AI Blog https://ai.googleblog.com/2023/01/google-research-2022-beyond-language.html 5 comments
- End-to-end Generative Pre-training for Multimodal Video Captioning – Google AI Blog https://ai.googleblog.com/2022/06/end-to-end-generative-pre-training-for.html 0 comments
Linked pages
- Introducing Pathways: A next-generation AI architecture https://blog.google/technology/ai/introducing-pathways-next-generation-ai-architecture/ 33 comments
- [2103.15691] ViViT: A Video Vision Transformer https://arxiv.org/abs/2103.15691 4 comments
- AudioSet https://research.google.com/audioset/ 3 comments
- Transformer: A Novel Neural Network Architecture for Language Understanding – Google AI Blog https://ai.googleblog.com/2017/08/transformer-novel-neural-network.html 3 comments
- Transformers for Image Recognition at Scale – Google AI Blog https://ai.googleblog.com/2020/12/transformers-for-image-recognition-at.html 1 comment
- RxR: A Multilingual Benchmark for Navigation Instruction Following – Google AI Blog https://ai.googleblog.com/2021/01/rxr-multilingual-benchmark-for.html 0 comments
- Moments in Time http://moments.csail.mit.edu/ 0 comments
- Google AI Blog: Looking to Listen: Audio-Visual Speech Separation http://ai.googleblog.com/2018/04/looking-to-listen-audio-visual-speech.html 0 comments
- Learning Cross-Modal Temporal Representations from Unlabeled Videos – Google AI Blog https://ai.googleblog.com/2019/09/learning-cross-modal-temporal.html 0 comments
- Conceptual Captions: A New Dataset and Challenge for Image Captioning – Google AI Blog https://ai.googleblog.com/2018/09/conceptual-captions-new-dataset-and.html 0 comments
Related searches:
Search whole site: site:ai.googleblog.com
Search title: Multimodal Bottleneck Transformer (MBT): A New Model for Modality Fusion – Google AI Blog
See how to search.