2023 . 10 . 2

Temporal Normalization in Attentive Key-frame Extraction for Deep Neural Video Summarization

Attention-based neural architectures have consistently demonstrated superior performance over Long Short-Term Memory (LSTM) Deep Neural Networks (DNNs) in tasks such as key-frame extraction for video summarization. However, existing approaches mostly rely on rather shallow Transformer DNNs. This paper revisits the issue of model depth and proposes DATS: a deep attentive architecture for supervised video summarization that meaningfully exploits skip connections. Additionally, a novel per-layer temporal normalization algorithm is proposed that yields improved test accuracy. Finally, the model’s noisy output is rectified in an innovative post-processing step. Experiments conducted on two common, publicly available benchmark datasets showcase performance superior to competing state-of-the-art video summarization methods, both supervised and unsupervised.

more news

#63

The Five Pillars of AI Learning for Media

#62

New Use Cases Demo Videos Now Live!

#61

AI4Media’s Lasting Legacy in Advancing AI for Media

Cookie Settings

AI4Media may use cookies to store your login data, collect statistics to optimize the website’s functionality and to perform marketing actions based on your interests. You can personalize your cookies in .