Multimodal Bottleneck Transformer (MBT): A New Model for Modality Fusion – Google AI Blog - discu.eu

Linking pages

Vid2Seq: a pretrained visual language model for describing multi-event videos – Google AI Blog https://ai.googleblog.com/2023/03/vid2seq-pretrained-visual-language.html 16 comments
Google Research, 2022 & beyond: Language, vision and generative models – Google AI Blog https://ai.googleblog.com/2023/01/google-research-2022-beyond-language.html 5 comments
End-to-end Generative Pre-training for Multimodal Video Captioning – Google AI Blog https://ai.googleblog.com/2022/06/end-to-end-generative-pre-training-for.html 0 comments

Linked pages

Related searches:

Search whole site: site:ai.googleblog.com

Search title: Multimodal Bottleneck Transformer (MBT): A New Model for Modality Fusion – Google AI Blog

See how to search.

Submit link to: