Co-training Transformer with Videos and Images Improves Action Recognition – Google AI Blog - discu.eu

Linking pages

Linked pages

[1706.03762] Attention Is All You Need https://arxiv.org/abs/1706.03762 145 comments
ImageNet http://image-net.org/index 12 comments
[1512.03385] Deep Residual Learning for Image Recognition http://arxiv.org/abs/1512.03385 6 comments
[2103.15691] ViViT: A Video Vision Transformer https://arxiv.org/abs/2103.15691 4 comments
Moments in Time http://moments.csail.mit.edu/ 0 comments
Revisiting the Unreasonable Effectiveness of Data – Google AI Blog https://ai.googleblog.com/2017/07/revisiting-unreasonable-effectiveness.html 0 comments
[1705.06950] The Kinetics Human Action Video Dataset https://arxiv.org/abs/1705.06950 0 comments
[2010.11929] An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale https://arxiv.org/abs/2010.11929 0 comments
[2106.11297] TokenLearner: What Can 8 Learned Tokens Do for Images and Videos? https://arxiv.org/abs/2106.11297 0 comments
Multi-task learning - Wikipedia https://en.wikipedia.org/wiki/Multi-task_learning 0 comments
https://arxiv.org/abs/2104.11178 0 comments
[2102.05095] Is Space-Time Attention All You Need for Video Understanding? https://arxiv.org/abs/2102.05095 0 comments

Related searches:

Search whole site: site:ai.googleblog.com

Search title: Co-training Transformer with Videos and Images Improves Action Recognition – Google AI Blog

See how to search.

Submit link to: