Video to Caption

Team Project

Generate captions based on the given videos.

Figure 1. The network architecture of S2VT published by Venugopalan et al. in ICCV 2015.

In this project, we aim to generate captions for video sequences using a neural network. We implement two different models: Sequence to Sequence and S2VT. We also improve the models using different attention mechanisms, bidirectional RNN, and scheduled sampling. Results are evaluated using the BLEU score and perplexity. More information can be found in the technical report (written in Chinese). Our code is publically available at GitHub.