A collection of images taken by day and by night for vehicle detection, segmentation and counting in parking areas.
Each image is manually labeled with pixel-wise masks and bounding boxes localizing vehicle instances. The dataset includes about 250 images depicting several parking areas describing most of the problematic situations that we can find in a real scenario: seven different cameras capture the images under various weather conditions and viewing angles. Another challenging aspect is the presence of partial occlusion patterns in many scenes such as obstacles (trees, lampposts, other cars) and shadowed cars. The main peculiarity is that images are taken during the day and the night, showing utterly different lighting conditions.
We suggest a three-way split (train-validation-test). The train split contains images taken during the daytime while validation and test splits include images gathered at night.
L. Ciampi, C. Santiago, J. Costeira, C. Gennaro, and G. Amato (2021). "Domain Adaptation for Traffic Density Estimation", Proc. of the 16th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - Vol. 5: VISAPP, pp. 185-195. DOI: 10.5220/0010303401850195
This dataset contains pretrained models of the CA-SUM network architecture for video summarization, that is presented in our work titled “Summarizing Videos using Concentrated Attention and Considering the Uniqueness and Diversity of the Video Frames”, in Proc. ACM ICMR 2022.
In our ICMR 2022 paper we describe a new method for unsupervised video summarization. To overcome limitations of existing unsupervised video summarization approaches, that relate to the unstable training of Generator-Discriminator architectures, the use of RNNs for modeling long-range frames' dependencies and the ability to parallelize the training process of RNN-based network architectures, the developed method relies solely on the use of a self-attention mechanism to estimate the importance of video frames. Instead of simply modeling the frames' dependencies based on global attention, our method integrates a concentrated attention mechanism that is able to focus on non-overlapping blocks in the main diagonal of the attention matrix, and to enrich the existing information by extracting and exploiting knowledge about the uniqueness and diversity of the associated frames of the video. In this way, our method makes better estimates about the significance of different parts of the video, and drastically reduces the number of learnable parameters. Experimental evaluations using two benchmarking datasets (SumMe and TVSum) show the competitiveness of the proposed method against other state-of-the-art unsupervised summarization approaches, and demonstrate its ability to produce video summaries that are very close to the human preferences. An ablation study that focuses on the introduced components, namely the use of concentrated attention in combination with attention-based estimates about the frames' uniqueness and diversity, shows their relative contributions to the overall summarization performance.
E. Apostolidis, G. Balaouras, V. Mezaris, and I. Patras (2022). “Summarizing Videos using Concentrated Attention and Considering the Uniqueness and Diversity of the Video Frames”, Proc. of the 2022 International Conference on Multimedia Retrieval (ICMR ’22), June 2022, Newark, NJ. DOI: 10.1145/3512527.3531404
FaVCI2D tackles problematic design choices of existing face verification datasets: (1) imposter pairs are too easy, (2) the demographic diversity is insufficient, and (3) there is disregard for ethical and legal aspects. The dataset includes challenging imposters and metadata related to gender, country and age. It is generated from freely distributable data. It was created by AI4Media partners CEA and UPB and is intended for the face recognition/verification community, and for researchers who study bias and fairness in AI.
Popescu, A., Ştefan, L. D., Deshayes-Chossart, J., & Ionescu, B. (2021). Face Verification with Challenging Imposters and Diversified Demographics. LINK
The dataset supports the creation of algorithms which provide feedback about the potential effects of personal data sharing in real-life situations (i.e. search for a bank loan or a job). It is focused on visual profiles, which are manually rated and populated with visual object detections. The dataset is generated from freely distributable data and is anonymized in order to protect users’ privacy. It was created by AI4Media partners CEA and UPB and is usable by researchers interested in user privacy protection.
Ionescu, B., Müller, H., Péteri, R., Abacha, A. B., Demner-Fushman, D., Hasan, S. A., ... & Popescu, A. (2021, March). The 2021 ImageCLEF Benchmark: Multimedia Retrieval in Medical, Nature, Internet and Social Media Applications. In European Conference on Information Retrieval (pp. 616-623). Springer, Cham. LINK
This dataset is intended to be used for assessing the prediction of how memorable a video will be. The PMMD dataset is a subset of a collection consisting of 12,000 short videos retrieved from TRECVid and Memento10k. The dataset is annotated for short- and long-term memorability and, in its latest version, features three subtasks: a prediction, a generalization, and an EEG-based subtask. The dataset is generated from freely distributable data and is anonymized in order to protect users’ privacy. It is addressed to researchers interested in the prediction of short- and long-term memorability. AI4Media partners UPB and InterDigital are co-creators of this dataset, along with the University of Essex, Dublin City University and the Massachusetts Institute of Technology.
De Herrera, A. G. S., Kiziltepe, R. S., Chamberlain, J., Constantin, M. G., Demarty, C. H., Doctor, F., Ionescu, B., & Smeaton, A. F. (2020). Overview of MediaEval 2020 Predicting Media Memorability Task: What Makes a Video Memorable?. Working Notes Proceedings of the MediaEval 2020 Workshop. LINK
The Interestingness10k dataset is designed for the task of predicting multimedia interestingness in images and videos. Data consists of movie excerpts and key-frames and their corresponding ground-truth files based on the classification into interesting and non-interesting samples, interestingness score, along with a set of pre-processed descriptors. Also provided is a thorough analysis of this dataset, method and feature performance analysis, performance enhancement suggestions and many more aspects related to interestingness prediction. The dataset is generated from freely distributable data and is anonymized in order to protect users’ privacy. It is addressed to researchers interested in the prediction of image and video interestingness. AI4Media partners UPB and InterDigital are co-creators of this dataset along with CSC - IT Center for Science.
Constantin, M. G., Ştefan, L. D., Ionescu, B., Duong, N. Q., Demarty, C. H., & Sjöberg, M. (2021). Visual Interestingness Prediction: A Benchmark Framework and Literature Review. International Journal of Computer Vision, 129 (1526–1550). LINK
The aim of LeQua 2022 (the 1st edition of the CLEF “Learning to Quantify” lab) is to allow the comparative evaluation of methods for “learning to quantify” in textual datasets, i.e., methods for training predictors of the relative frequencies of the classes of interest in sets of unlabelled textual documents. These predictors (called “quantifiers”) will be required to issue predictions for several such sets, some of them characterized by class frequencies radically different from the ones of the training set.
LeQua 2022 will offer two tasks (T1 and T2), each admitting two subtasks (A and B): Tasks T1A: This task is concerned with evaluating binary quantifiers, i.e., quantifiers that must only predict the relative frequencies of a class and its complement. Participants in this task will be provided with documents already converted into vector form; the task is thus suitable for participants who do not wish to engage in generating representations for the textual documents, but want instead to concentrate on optimizing the methods for learning to quantify.
T1B: This task is concerned with evaluating single-label multi-class quantifiers, i.e., quantifiers that operate on documents that each belong to exactly one among a set of n>2 classes. Like in Task T1A, participants will be provided with documents already converted in vector form.
T2A: Like Task T1A, this task is concerned with evaluating binary quantifiers. Unlike in Task T1A, participants will be provided with the raw text of the documents; the task is thus suitable for participants who also wish to engage in generating suitable representations for the textual documents, or to train end-to-end systems. T2B: Like Task T1B, this task is concerned with evaluating single-label multi-class quantifiers; like in Task T2A, participants will be provided with the raw text of the documents.
A synthetic dataset for fallen people detection comprising images extracted from the highly photo-realistic video game Grand Theft Auto V developed by Rockstar North.
Each image is labeled by the game engine providing bounding boxes and statuses (fallen or non-fallen) of people present in the scene. The dataset comprises 6,071 synthetic images depicting 7,456 fallen and 26,125 non-fallen pedestrian instances in various looks, camera positions, background scenes, lightning, and occlusion conditions.
Fabio Carrara; Lorenzo Pasco; Claudio Gennaro; Fabrizio Falchi. VWFP: Virtual World Fallen People Dataset for Visual Fallen People Detection
The first large scale dataset for benchmarking domain adaptation methods for action recognition in the challenging task of transferring knowledge from the synthetic to the real domain.
The dataset comprises 36,195 videos, divided into 14 action categories and two domains, i.e., the source domain (synthetic videos from Mixamo) and the target domain (real videos from Kinetics).
Required Cookies They allow you to browse the website and use its applications as well as to access secure areas of the website. Without these cookies, the services you have requested cannot be provided.
Functional Cookies These cookies are necessary to allow the main functionality of the website and they are activated automatically when you enter this website. They store user preferences for site usage so that you do not need to reconfigure the site each time you visit it.
Advertising Cookies These cookies direct advertising according to the interests of each user so as to direct advertising campaigns, taking into account the tastes of users, and they also limit the number of times you see the ad, helping to measure the effectiveness of advertising and the success of the website organisation.