MaMMUT: A simple vision-encoder text-decoder architecture for multimodal tasks – Google AI Blog - discu.eu

Hacker News

MaMMUT: A simple vision-encoder text-decoder architecture for multimodal tasks https://ai.googleblog.com/2023/05/mammut-simple-vision-encoder-text.html 33 comments 4/5/2023

Linking pages

Modular visual question answering via code generation – Google Research Blog https://ai.googleblog.com/2023/07/modular-visual-question-answering-via.html 0 comments

Linked pages

Related searches:

Search whole site: site:ai.googleblog.com

Search title: MaMMUT: A simple vision-encoder text-decoder architecture for multimodal tasks – Google AI Blog

See how to search.

Submit link to: