Vid2Seq: a pretrained visual language model for describing multi-event videos – Google AI Blog - discu.eu

Hacker News

Vid2Seq: A pretrained visual language model for describing multi-event videos https://ai.googleblog.com/2023/03/vid2seq-pretrained-visual-language.html 16 comments 17/3/2023

Linked pages

Exploring Transfer Learning with T5: the Text-To-Text Transfer Transformer – Google AI Blog https://ai.googleblog.com/2020/02/exploring-transfer-learning-with-t5.html 66 comments
[2302.14115] Vid2Seq: Large-Scale Pretraining of a Visual Language Model for Dense Video Captioning https://arxiv.org/abs/2302.14115 14 comments
Transformer: A Novel Neural Network Architecture for Language Understanding – Google AI Blog https://ai.googleblog.com/2017/08/transformer-novel-neural-network.html 3 comments
Learning Cross-Modal Temporal Representations from Unlabeled Videos – Google AI Blog https://ai.googleblog.com/2019/09/learning-cross-modal-temporal.html 0 comments
Multimodal Bottleneck Transformer (MBT): A New Model for Modality Fusion – Google AI Blog https://ai.googleblog.com/2022/03/multimodal-bottleneck-transformer-mbt.html 0 comments
Conceptual Captions: A New Dataset and Challenge for Image Captioning – Google AI Blog https://ai.googleblog.com/2018/09/conceptual-captions-new-dataset-and.html 0 comments
Pix2Seq: A New Language Interface for Object Detection – Google AI Blog https://ai.googleblog.com/2022/04/pix2seq-new-language-interface-for.html 0 comments
Image-Text Pre-training with Contrastive Captioners – Google AI Blog https://ai.googleblog.com/2022/05/image-text-pre-training-with.html 0 comments

Related searches:

Search whole site: site:ai.googleblog.com

Search title: Vid2Seq: a pretrained visual language model for describing multi-event videos – Google AI Blog

See how to search.

Submit link to: