Hacker News
- Vid2Seq: A pretrained visual language model for describing multi-event videos https://ai.googleblog.com/2023/03/vid2seq-pretrained-visual-language.html 16 comments
Linked pages
- Exploring Transfer Learning with T5: the Text-To-Text Transfer Transformer – Google AI Blog https://ai.googleblog.com/2020/02/exploring-transfer-learning-with-t5.html 66 comments
- [2302.14115] Vid2Seq: Large-Scale Pretraining of a Visual Language Model for Dense Video Captioning https://arxiv.org/abs/2302.14115 14 comments
- Transformer: A Novel Neural Network Architecture for Language Understanding – Google AI Blog https://ai.googleblog.com/2017/08/transformer-novel-neural-network.html 3 comments
- Learning Cross-Modal Temporal Representations from Unlabeled Videos – Google AI Blog https://ai.googleblog.com/2019/09/learning-cross-modal-temporal.html 0 comments
- Multimodal Bottleneck Transformer (MBT): A New Model for Modality Fusion – Google AI Blog https://ai.googleblog.com/2022/03/multimodal-bottleneck-transformer-mbt.html 0 comments
- Conceptual Captions: A New Dataset and Challenge for Image Captioning – Google AI Blog https://ai.googleblog.com/2018/09/conceptual-captions-new-dataset-and.html 0 comments
- Pix2Seq: A New Language Interface for Object Detection – Google AI Blog https://ai.googleblog.com/2022/04/pix2seq-new-language-interface-for.html 0 comments
- Image-Text Pre-training with Contrastive Captioners – Google AI Blog https://ai.googleblog.com/2022/05/image-text-pre-training-with.html 0 comments
Related searches:
Search whole site: site:ai.googleblog.com
Search title: Vid2Seq: a pretrained visual language model for describing multi-event videos – Google AI Blog
See how to search.