Abstract
Neural image/video captioning models can generate accurate descriptions, but their internal process of mapping regions to words is a black box and the......
小提示:本篇文献需要登录阅读全文,点击跳转登录