Images constitute a large part of the content shared on social networks. Their disclosure is often related to a particular context and users are often unaware of the fact that, depending on their privacy status, images can be accessible to third parties and be used for purposes which were initially unforeseen. For instance, it is common practice for employers to search information about their future employees online. Another example of usage is that of automatic credit scoring based on online data. Most existing approaches which propose feedback about shared data focus on inferring user characteristics and their practical utility is rather limited.
We hypothesize that user feedback would be more efficient if conveyed through the real-life effects of data sharing. The objective of the task is to automatically score user photographic profiles in a series of situations with strong impact on her/his life. Four such situations were modeled this year and refer to searching for: (1) a bank loan, (2) an accommodation, (3) a job as waitress/waiter and (4) a job in IT. The inclusion of several situations is interesting in order to make it clear to the end users of the system that the same image will be interpreted differently depending on the context. The final objective of the task is to encourage the development of efficient user feedback, such as the YDSYO Android app.
This archive contains the part 1 of Shift Benchmark on Multiple Sclerosis lesion segmentation data. This dataset is provided by the Shifts Project to enable assessment of the robustness of models to distributional shift and the quality of their uncertainty estimates. This part is the MSSEG data collected in the digital repository of the OFSEP Cohort provided in the context of the MICCAI 2016 and 2021 challenges. A full description of the benchmark is available in https://arxiv.org/pdf/2206.15407. Part 2 of the data is available here. To find out more about the Shifts Project, please visit https://shifts.ai
This archive contains part 2 of Shift Benchmark on Multiple Sclerosis lesion segmentation data. This dataset is provided by the Shifts Project to enable assessment of the robustness of models to distributional shift and the quality of their uncertainty estimates. This part is contains data collected from several different sources and distributed under a CC BY NC SA 4.0 license. Part 1 of the data is available here. A full description of the benchmark is available in https://arxiv.org/pdf/2206.15407. To find out more about the Shifts Project, please visit https://shifts.ai .
The Bus Violence dataset is a large-scale collection of videos depicting violent and non-violent situations in public transport environments. This benchmark was gathered from multiple cameras located inside a moving bus where several people simulated violent actions, such as stealing an object from another person, fighting between passengers, etc. It contains 1,400 video clips manually annotated as having or not violent scenes, making it one of the biggest benchmarks for video violence detection in the literature.
In this repository, we provide
- the 1,400 video clips divided into two folders named Violence /NoViolence, containing clips of violent situations and non-violent situations, respectively;
- two txt files containing the names of the videos belonging to the training and test splits, respectively.
A collection of images taken by day and by night for vehicle detection, segmentation and counting in parking areas.
Each image is manually labeled with pixel-wise masks and bounding boxes localizing vehicle instances.
The dataset includes about 250 images depicting several parking areas describing most of the problematic situations that we can find in a real scenario: seven different cameras capture the images under various weather conditions and viewing angles. Another challenging aspect is the presence of partial occlusion patterns in many scenes such as obstacles (trees, lampposts, other cars) and shadowed cars.
The main peculiarity is that images are taken during the day and the night, showing utterly different lighting conditions.
We suggest a three-way split (train-validation-test). The train split contains images taken during the daytime while validation and test splits include images gathered at night.
Related paper:
L. Ciampi, C. Santiago, J. Costeira, C. Gennaro, and G. Amato (2021). "Domain Adaptation for Traffic Density Estimation", Proc. of the 16th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - Vol. 5: VISAPP, pp. 185-195. DOI: 10.5220/0010303401850195
This repository contains a mapping between the classes of COCO, LVIS, and Open Images V4 datasets into a unique set of 1460 classes.
COCO [Lin et al 2014] contains 80 classes, LVIS [gupta2019lvis] contains 1460 classes, Open Images V4 [Kuznetsova et al. 2020] contains 601 classes.
We built a mapping of these classes using a semi-automatic procedure in order to have a unique final list of 1460 classes. We also generated a hierarchy for each class, using wordnet
This dataset contains pretrained models of the CA-SUM network architecture for video summarization, that is presented in our work titled “Summarizing Videos using Concentrated Attention and Considering the Uniqueness and Diversity of the Video Frames”, in Proc. ACM ICMR 2022.
In our ICMR 2022 paper we describe a new method for unsupervised video summarization. To overcome limitations of existing unsupervised video summarization approaches, that relate to the unstable training of Generator-Discriminator architectures, the use of RNNs for modeling long-range frames' dependencies and the ability to parallelize the training process of RNN-based network architectures, the developed method relies solely on the use of a self-attention mechanism to estimate the importance of video frames. Instead of simply modeling the frames' dependencies based on global attention, our method integrates a concentrated attention mechanism that is able to focus on non-overlapping blocks in the main diagonal of the attention matrix, and to enrich the existing information by extracting and exploiting knowledge about the uniqueness and diversity of the associated frames of the video. In this way, our method makes better estimates about the significance of different parts of the video, and drastically reduces the number of learnable parameters. Experimental evaluations using two benchmarking datasets (SumMe and TVSum) show the competitiveness of the proposed method against other state-of-the-art unsupervised summarization approaches, and demonstrate its ability to produce video summaries that are very close to the human preferences. An ablation study that focuses on the introduced components, namely the use of concentrated attention in combination with attention-based estimates about the frames' uniqueness and diversity, shows their relative contributions to the overall summarization performance.
Related paper:
E. Apostolidis, G. Balaouras, V. Mezaris, and I. Patras (2022). “Summarizing Videos using Concentrated Attention and Considering the Uniqueness and Diversity of the Video Frames”, Proc. of the 2022 International Conference on Multimedia Retrieval (ICMR ’22), June 2022, Newark, NJ. DOI: 10.1145/3512527.3531404
FaVCI2D tackles problematic design choices of existing face verification datasets: (1) imposter pairs are too easy, (2) the demographic diversity is insufficient, and (3) there is disregard for ethical and legal aspects. The dataset includes challenging imposters and metadata related to gender, country and age. It is generated from freely distributable data. It was created by AI4Media partners CEA and UPB and is intended for the face recognition/verification community, and for researchers who study bias and fairness in AI.
Related paper:
Popescu, A., Ştefan, L. D., Deshayes-Chossart, J., & Ionescu, B. (2021). Face Verification with Challenging Imposters and Diversified Demographics. LINK
The dataset supports the creation of algorithms which provide feedback about the potential effects of personal data sharing in real-life situations (i.e. search for a bank loan or a job). It is focused on visual profiles, which are manually rated and populated with visual object detections. The dataset is generated from freely distributable data and is anonymized in order to protect users’ privacy. It was created by AI4Media partners CEA and UPB and is usable by researchers interested in user privacy protection.
Related paper:
Ionescu, B., Müller, H., Péteri, R., Abacha, A. B., Demner-Fushman, D., Hasan, S. A., ... & Popescu, A. (2021, March). The 2021 ImageCLEF Benchmark: Multimedia Retrieval in Medical, Nature, Internet and Social Media Applications. In European Conference on Information Retrieval (pp. 616-623). Springer, Cham. LINK
This dataset is intended to be used for assessing the prediction of how memorable a video will be. The PMMD dataset is a subset of a collection consisting of 12,000 short videos retrieved from TRECVid and Memento10k. The dataset is annotated for short- and long-term memorability and, in its latest version, features three subtasks: a prediction, a generalization, and an EEG-based subtask. The dataset is generated from freely distributable data and is anonymized in order to protect users’ privacy. It is addressed to researchers interested in the prediction of short- and long-term memorability. AI4Media partners UPB and InterDigital are co-creators of this dataset, along with the University of Essex, Dublin City University and the Massachusetts Institute of Technology.
Related paper:
De Herrera, A. G. S., Kiziltepe, R. S., Chamberlain, J., Constantin, M. G., Demarty, C. H., Doctor, F., Ionescu, B., & Smeaton, A. F. (2020). Overview of MediaEval 2020 Predicting Media Memorability Task: What Makes a Video Memorable?. Working Notes Proceedings of the MediaEval 2020 Workshop. LINK
The Interestingness10k dataset is designed for the task of predicting multimedia interestingness in images and videos. Data consists of movie excerpts and key-frames and their corresponding ground-truth files based on the classification into interesting and non-interesting samples, interestingness score, along with a set of pre-processed descriptors. Also provided is a thorough analysis of this dataset, method and feature performance analysis, performance enhancement suggestions and many more aspects related to interestingness prediction. The dataset is generated from freely distributable data and is anonymized in order to protect users’ privacy. It is addressed to researchers interested in the prediction of image and video interestingness. AI4Media partners UPB and InterDigital are co-creators of this dataset along with CSC - IT Center for Science.
Related paper:
Constantin, M. G., Ştefan, L. D., Ionescu, B., Duong, N. Q., Demarty, C. H., & Sjöberg, M. (2021). Visual Interestingness Prediction: A Benchmark Framework and Literature Review. International Journal of Computer Vision, 129 (1526–1550). LINK
The aim of LeQua 2022 (the 1st edition of the CLEF “Learning to Quantify” lab) is to allow the comparative evaluation of methods for “learning to quantify” in textual datasets, i.e., methods for training predictors of the relative frequencies of the classes of interest in sets of unlabelled textual documents. These predictors (called “quantifiers”) will be required to issue predictions for several such sets, some of them characterized by class frequencies radically different from the ones of the training set.
LeQua 2022 will offer two tasks (T1 and T2), each admitting two subtasks (A and B):
Tasks T1A: This task is concerned with evaluating binary quantifiers, i.e., quantifiers that must only predict the relative frequencies of a class and its complement. Participants in this task will be provided with documents already converted into vector form; the task is thus suitable for participants who do not wish to engage in generating representations for the textual documents, but want instead to concentrate on optimizing the methods for learning to quantify.
T1B: This task is concerned with evaluating single-label multi-class quantifiers, i.e., quantifiers that operate on documents that each belong to exactly one among a set of n>2 classes. Like in Task T1A, participants will be provided with documents already converted in vector form.
T2A: Like Task T1A, this task is concerned with evaluating binary quantifiers. Unlike in Task T1A, participants will be provided with the raw text of the documents; the task is thus suitable for participants who also wish to engage in generating suitable representations for the textual documents, or to train end-to-end systems. T2B: Like Task T1B, this task is concerned with evaluating single-label multi-class quantifiers; like in Task T2A, participants will be provided with the raw text of the documents.
A synthetic dataset for fallen people detection comprising images extracted from the highly photo-realistic video game Grand Theft Auto V developed by Rockstar North.
Each image is labeled by the game engine providing bounding boxes and statuses (fallen or non-fallen) of people present in the scene. The dataset comprises 6,071 synthetic images depicting 7,456 fallen and 26,125 non-fallen pedestrian instances in various looks, camera positions, background scenes, lightning, and occlusion conditions.
Related paper:
Fabio Carrara; Lorenzo Pasco; Claudio Gennaro; Fabrizio Falchi. VWFP: Virtual World Fallen People Dataset for Visual Fallen People Detection
The first large scale dataset for benchmarking domain adaptation methods for action recognition in the challenging task of transferring knowledge from the synthetic to the real domain.
The dataset comprises 36,195 videos, divided into 14 action categories and two domains, i.e., the source domain (synthetic videos from Mixamo) and the target domain (real videos from Kinetics).
AI4Media may use cookies to store your login data, collect statistics to optimize the website's functionality and to perform marketing actions based on your interests.
Required Cookies They allow you to browse the website and use its applications as well as to access secure areas of the website. Without these cookies, the services you have requested cannot be provided.
Functional Cookies These cookies are necessary to allow the main functionality of the website and they are activated automatically when you enter this website. They store user preferences for site usage so that you do not need to reconfigure the site each time you visit it.
Advertising Cookies These cookies direct advertising according to the interests of each user so as to direct advertising campaigns, taking into account the tastes of users, and they also limit the number of times you see the ad, helping to measure the effectiveness of advertising and the success of the website organisation.