Linking pages
- Google Research, 2022 & beyond: Language, vision and generative models – Google AI Blog https://ai.googleblog.com/2023/01/google-research-2022-beyond-language.html 5 comments
- Learning from Weakly-Labeled Videos via Sub-Concepts – Google AI Blog https://ai.googleblog.com/2022/03/learning-from-weakly-labeled-videos-via.html 0 comments
- End-to-end Generative Pre-training for Multimodal Video Captioning – Google AI Blog https://ai.googleblog.com/2022/06/end-to-end-generative-pre-training-for.html 0 comments
Linked pages
- [1706.03762] Attention Is All You Need https://arxiv.org/abs/1706.03762 145 comments
- ImageNet http://image-net.org/index 12 comments
- [1512.03385] Deep Residual Learning for Image Recognition http://arxiv.org/abs/1512.03385 6 comments
- [2103.15691] ViViT: A Video Vision Transformer https://arxiv.org/abs/2103.15691 4 comments
- Moments in Time http://moments.csail.mit.edu/ 0 comments
- Revisiting the Unreasonable Effectiveness of Data – Google AI Blog https://ai.googleblog.com/2017/07/revisiting-unreasonable-effectiveness.html 0 comments
- [1705.06950] The Kinetics Human Action Video Dataset https://arxiv.org/abs/1705.06950 0 comments
- [2010.11929] An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale https://arxiv.org/abs/2010.11929 0 comments
- [2106.11297] TokenLearner: What Can 8 Learned Tokens Do for Images and Videos? https://arxiv.org/abs/2106.11297 0 comments
- Multi-task learning - Wikipedia https://en.wikipedia.org/wiki/Multi-task_learning 0 comments
- https://arxiv.org/abs/2104.11178 0 comments
- [2102.05095] Is Space-Time Attention All You Need for Video Understanding? https://arxiv.org/abs/2102.05095 0 comments
Related searches:
Search whole site: site:ai.googleblog.com
Search title: Co-training Transformer with Videos and Images Improves Action Recognition – Google AI Blog
See how to search.