AI4media project

Exploiting Caption Diversity for Unsupervised Video Summarization

Temporal Normalization in Attentive Key-frame Extraction for Deep Neural Video Summarization

Attention-based neural architectures have consistently demonstrated superior performance over Long Short-Term Memory (LSTM) Deep Neural Networks (DNNs) in tasks such as key-frame extraction for video summarization. However, existing approaches mostly rely on rather shallow Transformer DNNs. This paper revisits the issue of model depth and proposes DATS: a deep attentive architecture for supervised video summarization that meaningfully exploits skip connections. Additionally, a novel per-layer temporal normalization algorithm is proposed that yields improved test accuracy. Finally, the model’s noisy output is rectified in an innovative post-processing step. Experiments conducted on two common, publicly available benchmark datasets showcase performance superior to competing state-of-the-art video summarization methods, both supervised and unsupervised.

Adversarial unsupervised video summarization augmented with dictionary loss

Cookie Settings

AI4Media may use cookies to store your login data, collect statistics to optimize the website’s functionality and to perform marketing actions based on your interests. You can personalize your cookies in .