Abstract
Deep models are state-of-the-art for many vision tasks including video action recognition and video captioning. Models are trained to caption or class......
小提示:本篇文献需要登录阅读全文,点击跳转登录