[2302.14115] Vid2Seq: Large-Scale Pretraining of a Visual Language Model for Dense Video Captioning - discu.eu

Reddit

[R] Vid2Seq: a pretrained visual language model for describing multi-event videos https://arxiv.org/abs/2302.14115 14 comments 18/3/2023 machinelearning

Linking pages

Vid2Seq: a pretrained visual language model for describing multi-event videos – Google AI Blog https://ai.googleblog.com/2023/03/vid2seq-pretrained-visual-language.html 16 comments

Would you like to stay up to date with Computer science? Checkout Computer science Weekly.

Related searches:

Search whole site: site:arxiv.org

Search title: [2302.14115] Vid2Seq: Large-Scale Pretraining of a Visual Language Model for Dense Video Captioning

See how to search.

Submit link to: