Images constitute a large part of the content shared on social networks. Their disclosure is often related to a particular context and users are often unaware of the fact that, depending on their privacy status, images can be accessible to third parties and be used for purposes which were initially unforeseen. For instance, it is common practice for employers to search information about their future employees online. Another example of usage is that of automatic credit scoring based on online data. Most existing approaches which propose feedback about shared data focus on inferring user characteristics and their practical utility is rather limited.
We hypothesize that user feedback would be more efficient if conveyed through the real-life effects of data sharing. The objective of the task is to automatically score user photographic profiles in a series of situations with strong impact on her/his life. Four such situations were modeled this year and refer to searching for: (1) a bank loan, (2) an accommodation, (3) a job as waitress/waiter and (4) a job in IT. The inclusion of several situations is interesting in order to make it clear to the end users of the system that the same image will be interpreted differently depending on the context. The final objective of the task is to encourage the development of efficient user feedback, such as the YDSYO Android app.
This archive contains the part 1 of Shift Benchmark on Multiple Sclerosis lesion segmentation data. This dataset is provided by the Shifts Project to enable assessment of the robustness of models to distributional shift and the quality of their uncertainty estimates. This part is the MSSEG data collected in the digital repository of the OFSEP Cohort provided in the context of the MICCAI 2016 and 2021 challenges. A full description of the benchmark is available in https://arxiv.org/pdf/2206.15407. Part 2 of the data is available here. To find out more about the Shifts Project, please visit https://shifts.ai
This archive contains part 2 of Shift Benchmark on Multiple Sclerosis lesion segmentation data. This dataset is provided by the Shifts Project to enable assessment of the robustness of models to distributional shift and the quality of their uncertainty estimates. This part is contains data collected from several different sources and distributed under a CC BY NC SA 4.0 license. Part 1 of the data is available here. A full description of the benchmark is available in https://arxiv.org/pdf/2206.15407. To find out more about the Shifts Project, please visit https://shifts.ai .
The Bus Violence dataset is a large-scale collection of videos depicting violent and non-violent situations in public transport environments. This benchmark was gathered from multiple cameras located inside a moving bus where several people simulated violent actions, such as stealing an object from another person, fighting between passengers, etc. It contains 1,400 video clips manually annotated as having or not violent scenes, making it one of the biggest benchmarks for video violence detection in the literature.
In this repository, we provide
- the 1,400 video clips divided into two folders named Violence /NoViolence, containing clips of violent situations and non-violent situations, respectively;
- two txt files containing the names of the videos belonging to the training and test splits, respectively.
A collection of images taken by day and by night for vehicle detection, segmentation and counting in parking areas.
Each image is manually labeled with pixel-wise masks and bounding boxes localizing vehicle instances.
The dataset includes about 250 images depicting several parking areas describing most of the problematic situations that we can find in a real scenario: seven different cameras capture the images under various weather conditions and viewing angles. Another challenging aspect is the presence of partial occlusion patterns in many scenes such as obstacles (trees, lampposts, other cars) and shadowed cars.
The main peculiarity is that images are taken during the day and the night, showing utterly different lighting conditions.
We suggest a three-way split (train-validation-test). The train split contains images taken during the daytime while validation and test splits include images gathered at night.
Related paper:
L. Ciampi, C. Santiago, J. Costeira, C. Gennaro, and G. Amato (2021). "Domain Adaptation for Traffic Density Estimation", Proc. of the 16th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - Vol. 5: VISAPP, pp. 185-195. DOI: 10.5220/0010303401850195
This repository contains a mapping between the classes of COCO, LVIS, and Open Images V4 datasets into a unique set of 1460 classes.
COCO [Lin et al 2014] contains 80 classes, LVIS [gupta2019lvis] contains 1460 classes, Open Images V4 [Kuznetsova et al. 2020] contains 601 classes.
We built a mapping of these classes using a semi-automatic procedure in order to have a unique final list of 1460 classes. We also generated a hierarchy for each class, using wordnet
This dataset contains pretrained models of the CA-SUM network architecture for video summarization, that is presented in our work titled “Summarizing Videos using Concentrated Attention and Considering the Uniqueness and Diversity of the Video Frames”, in Proc. ACM ICMR 2022.
In our ICMR 2022 paper we describe a new method for unsupervised video summarization. To overcome limitations of existing unsupervised video summarization approaches, that relate to the unstable training of Generator-Discriminator architectures, the use of RNNs for modeling long-range frames' dependencies and the ability to parallelize the training process of RNN-based network architectures, the developed method relies solely on the use of a self-attention mechanism to estimate the importance of video frames. Instead of simply modeling the frames' dependencies based on global attention, our method integrates a concentrated attention mechanism that is able to focus on non-overlapping blocks in the main diagonal of the attention matrix, and to enrich the existing information by extracting and exploiting knowledge about the uniqueness and diversity of the associated frames of the video. In this way, our method makes better estimates about the significance of different parts of the video, and drastically reduces the number of learnable parameters. Experimental evaluations using two benchmarking datasets (SumMe and TVSum) show the competitiveness of the proposed method against other state-of-the-art unsupervised summarization approaches, and demonstrate its ability to produce video summaries that are very close to the human preferences. An ablation study that focuses on the introduced components, namely the use of concentrated attention in combination with attention-based estimates about the frames' uniqueness and diversity, shows their relative contributions to the overall summarization performance.
Related paper:
E. Apostolidis, G. Balaouras, V. Mezaris, and I. Patras (2022). “Summarizing Videos using Concentrated Attention and Considering the Uniqueness and Diversity of the Video Frames”, Proc. of the 2022 International Conference on Multimedia Retrieval (ICMR ’22), June 2022, Newark, NJ. DOI: 10.1145/3512527.3531404
FaVCI2D tackles problematic design choices of existing face verification datasets: (1) imposter pairs are too easy, (2) the demographic diversity is insufficient, and (3) there is disregard for ethical and legal aspects. The dataset includes challenging imposters and metadata related to gender, country and age. It is generated from freely distributable data. It was created by AI4Media partners CEA and UPB and is intended for the face recognition/verification community, and for researchers who study bias and fairness in AI.
Related paper:
Popescu, A., Ştefan, L. D., Deshayes-Chossart, J., & Ionescu, B. (2021). Face Verification with Challenging Imposters and Diversified Demographics. LINK
The dataset supports the creation of algorithms which provide feedback about the potential effects of personal data sharing in real-life situations (i.e. search for a bank loan or a job). It is focused on visual profiles, which are manually rated and populated with visual object detections. The dataset is generated from freely distributable data and is anonymized in order to protect users’ privacy. It was created by AI4Media partners CEA and UPB and is usable by researchers interested in user privacy protection.
Related paper:
Ionescu, B., Müller, H., Péteri, R., Abacha, A. B., Demner-Fushman, D., Hasan, S. A., ... & Popescu, A. (2021, March). The 2021 ImageCLEF Benchmark: Multimedia Retrieval in Medical, Nature, Internet and Social Media Applications. In European Conference on Information Retrieval (pp. 616-623). Springer, Cham. LINK
This dataset is intended to be used for assessing the prediction of how memorable a video will be. The PMMD dataset is a subset of a collection consisting of 12,000 short videos retrieved from TRECVid and Memento10k. The dataset is annotated for short- and long-term memorability and, in its latest version, features three subtasks: a prediction, a generalization, and an EEG-based subtask. The dataset is generated from freely distributable data and is anonymized in order to protect users’ privacy. It is addressed to researchers interested in the prediction of short- and long-term memorability. AI4Media partners UPB and InterDigital are co-creators of this dataset, along with the University of Essex, Dublin City University and the Massachusetts Institute of Technology.
Related paper:
De Herrera, A. G. S., Kiziltepe, R. S., Chamberlain, J., Constantin, M. G., Demarty, C. H., Doctor, F., Ionescu, B., & Smeaton, A. F. (2020). Overview of MediaEval 2020 Predicting Media Memorability Task: What Makes a Video Memorable?. Working Notes Proceedings of the MediaEval 2020 Workshop. LINK
The Interestingness10k dataset is designed for the task of predicting multimedia interestingness in images and videos. Data consists of movie excerpts and key-frames and their corresponding ground-truth files based on the classification into interesting and non-interesting samples, interestingness score, along with a set of pre-processed descriptors. Also provided is a thorough analysis of this dataset, method and feature performance analysis, performance enhancement suggestions and many more aspects related to interestingness prediction. The dataset is generated from freely distributable data and is anonymized in order to protect users’ privacy. It is addressed to researchers interested in the prediction of image and video interestingness. AI4Media partners UPB and InterDigital are co-creators of this dataset along with CSC - IT Center for Science.
Related paper:
Constantin, M. G., Ştefan, L. D., Ionescu, B., Duong, N. Q., Demarty, C. H., & Sjöberg, M. (2021). Visual Interestingness Prediction: A Benchmark Framework and Literature Review. International Journal of Computer Vision, 129 (1526–1550). LINK
The aim of LeQua 2022 (the 1st edition of the CLEF “Learning to Quantify” lab) is to allow the comparative evaluation of methods for “learning to quantify” in textual datasets, i.e., methods for training predictors of the relative frequencies of the classes of interest in sets of unlabelled textual documents. These predictors (called “quantifiers”) will be required to issue predictions for several such sets, some of them characterized by class frequencies radically different from the ones of the training set.
LeQua 2022 will offer two tasks (T1 and T2), each admitting two subtasks (A and B):
Tasks T1A: This task is concerned with evaluating binary quantifiers, i.e., quantifiers that must only predict the relative frequencies of a class and its complement. Participants in this task will be provided with documents already converted into vector form; the task is thus suitable for participants who do not wish to engage in generating representations for the textual documents, but want instead to concentrate on optimizing the methods for learning to quantify.
T1B: This task is concerned with evaluating single-label multi-class quantifiers, i.e., quantifiers that operate on documents that each belong to exactly one among a set of n>2 classes. Like in Task T1A, participants will be provided with documents already converted in vector form.
T2A: Like Task T1A, this task is concerned with evaluating binary quantifiers. Unlike in Task T1A, participants will be provided with the raw text of the documents; the task is thus suitable for participants who also wish to engage in generating suitable representations for the textual documents, or to train end-to-end systems. T2B: Like Task T1B, this task is concerned with evaluating single-label multi-class quantifiers; like in Task T2A, participants will be provided with the raw text of the documents.
A synthetic dataset for fallen people detection comprising images extracted from the highly photo-realistic video game Grand Theft Auto V developed by Rockstar North.
Each image is labeled by the game engine providing bounding boxes and statuses (fallen or non-fallen) of people present in the scene. The dataset comprises 6,071 synthetic images depicting 7,456 fallen and 26,125 non-fallen pedestrian instances in various looks, camera positions, background scenes, lightning, and occlusion conditions.
Related paper:
Fabio Carrara; Lorenzo Pasco; Claudio Gennaro; Fabrizio Falchi. VWFP: Virtual World Fallen People Dataset for Visual Fallen People Detection
The first large scale dataset for benchmarking domain adaptation methods for action recognition in the challenging task of transferring knowledge from the synthetic to the real domain.
The dataset comprises 36,195 videos, divided into 14 action categories and two domains, i.e., the source domain (synthetic videos from Mixamo) and the target domain (real videos from Kinetics).
Dataset of tweet IDs and timestamps in CSV format, organized by COVID-19-related discussion topics they belong to. Currently, it contains a total of 3,423,260 entries between 10 topics and covers 2 time periods: March to August 2020 and August 2020 to March 2021. General information about the topics can be found in info.csv. All timestamps are in UNIX seconds.
A set of 24 JSON files that each represents a recommendation category and contains a set of items with all necessary information as extracted by the CUHE platform. For each of the items this file also contains the apiResponse element, which corresponds to the complete API response from Europeana. The Europeana API description is available here, the apiResponse follows the EDM (Europeana Data Model) specification which is available here. The CUHE project has indirectly received funding from the European Union Horizon 2020 research and innovation action programme, via the AI4Media Open Call #1 issued and executed under the AI4Media project (Grant Agreement no. 951911).
"The Pest Sticky Traps (PST) dataset is a collection of yellow chromotropic sticky trap pictures specifically designed for training/testing deep learning models to automatically count insects and estimate pest populations. The dataset comprises two subsets: - a subset we suggest using for the training/validation phases (contained in the `train/` folder) - a subset we suggest using for the test phase (contained in the `test/` folder)"
"This dataset contains object detection annotations and quality metadata of historic film with scientific and educational content. The content is from the collection ""Österreichische Bundesinstitut für den Wissenschaftlichen Film (ÖWF)"" of the Austrian Mediathek. The repository contains the code to construct a keyframe set from the publicly available videos as well as the object annotations for these keyframes."
These four labeled data sets are targeted at ordinal quantification. The goal of quantification is not to predict the label of each individual instance, but the distribution of labels in unlabeled sets of data. With the scripts provided, you can extract CSV files from the UCI machine learning repository and from OpenML. The ordinal class labels stem from a binning of a continuous regression label. We complement this data set with the indices of data items that appear in each sample of our evaluation. Hence, you can precisely replicate our samples by drawing the specified data items. The indices stem from two evaluation protocols that are well suited for ordinal quantification. To this end, each row in the files app_val_indices.csv, app_tst_indices.csv, app-oq_val_indices.csv, and app-oq_tst_indices.csv represents one sample. Our first protocol is the artificial prevalence protocol (APP), where all possible distributions of labels are drawn with an equal probability. The second protocol, APP-OQ, is a variant thereof, where only the smoothest 20% of all APP samples are considered. This variant is targeted at ordinal quantification tasks, where classes are ordered and a similarity of neighboring classes can be assumed.
MAD-TSC is a multilingual aligned dataset for target-based sentiment classification. Such a resource is needed to quantify the sentiment expressed about individual entities in texts, and thus to contribute to an fine-grained understanding of texts. The sentences in the dataset are aligned in all languages in order to facilitate performance comparisons between languages and to support multilingual experiments. For instance, the dataset was used that sentiment prediction works best when combining automatic translation and prediction in English. The main explanation of this finding is that pretrained language models are stronger in English and can be effectively fine-tuned for the task.
AUTH has created the “GreekPolitics Dataset” in the context of the “AI4Media” collaborative project, funded from the European Union’s Horizon 2020 research and innovation programme. The AUTH GreekPolitics Dataset contains 2,578 tweet IDs from Twitter posts with politically charged content in the Greek language, spanning the period January 2014 – March 2021. Manually annotated ground-truth labels along 4 sentiment dimensions are provided for each tweet: polarity (‘1’ = positive, ‘0’ = neutral, ‘-1’ = negative), figurativeness (‘1’ = figurative, ‘0’ = literal), aggressiveness (‘1’ = aggressive, ‘0’ = non-aggressive) and bias (‘1’ = partisan, ‘0’ = non-partisan).
CelebHQGaze consists of 29,255 high resolution celebrity images that are collected from CelebAHQ. It consists of 21,005 face images with the eyes staring at the camera and 8,250 face images with eyes looking somewhere else. Similarly to CelebGaze, we extract facial landmarks and generate the mask. All images are cropped to 512×512, and the mask size is fixed to 46×80.
The Florence 4D Facial Expression Dataset is composed of dynamic sequences of 3D face models, where a combination of synthetic and real identities exhibit an unprecedented variety of 4D facial expressions, with variations that include the classical neutral-apex transition, but generalize to expression-to-expression. All these characteristics are not exposed by any of the existing 4D datasets, and they cannot even be obtained by combining more than one dataset.
NEFER is a Dataset for Neuromorphic Event-based Facial Expression Recognition. NEFER is composed of paired RGB and event videos representing human faces labeled with the respective emotions and also annotated with face bounding boxes and facial landmarks.
PEM360 is a new dataset of user head movements and gaze recordings in 360° videos, along with self-reported emotional ratings of valence and arousal, and continuous physiological measurement of electrodermal activity and heart rate.
100-Driver is a large-scale, diverse posture-based distracted diver dataset, with more than 470K images taken by 4 cameras observing 100 drivers over 79 hours from 5 vehicles. 100-Driver involves different types of variations that closely meet real-world applications, including changes in the vehicle, person, camera view, lighting, and modality. We provide a detailed analysis of 100-Driver and present 4 settings for investigating practical problems of DDC, including the traditional setting without domain shift and 3 challenging settings (i.e., cross-modality, cross-view, and cross-vehicle) with domain shifts.
This labeled data set is targeted at ordinal quantification. The goal of quantification is not to predict the class label of each individual instance, but the distribution of labels in unlabeled sets of data.With the scripts provided, you can extract the relevant features and labels from the public data set of the FACT Cherenkov telescope. These features are precisely the ones that domain experts from astro-particle physics employ in their analyses. The labels stem from a binning of a continuous energy label, which is common practice in these analyses. We complement this data set with the indices of data items that appear in each sample of our evaluation. Hence, you can precisely replicate our samples by drawing the specified data items.
This data set comprises a labeled training set, validation samples, and testing samples for ordinal quantification. The goal of quantification is not to predict the class label of each individual instance, but the distribution of labels in unlabeled sets of data. The data is extracted from the McAuley data set of product reviews in Amazon, where the goal is to predict the 5-star rating of each textual review. We have sampled this data according to three protocols that are designed for the evaluation of quantification methods.
This dataset contains high-resolution images for the visualization of perineuronal nets (PNNs) and parvalbumin-expressing (PV) cells. The dataset contains microscopy images of coronal brain slices from 7 adult mice.
This dataset is meant to be used for experiments of Authorship Analysis. The dataset consists of abstracts of single-author papers from arXiv crawled using the arXiv's API by querying a list of computer-science-related keywords ("deep learning", "machine learning", "information retrieval", "computer science", "data mining", "support vector", "logistic regression", "artificial intelligence", "supervised learning"'). The corpus somehow follows a power-law distribution, with few prolific authors and many authors accounting for very few papers each: we retained authors with at least 10 papers, resulting in a total of 1,469 documents from 100 authors. The most prolific authors (Peter D. Turney and Subhash Kak) have 34 abstracts to their names, the 10 most prolific authors have written 22 or more articles, while 50% of the authors have no more than 12 abstracts to their names. In order to divide the corpus into a training set and a test set we perform a stratified split, with the production of each author being split into a training set (70%) and a test set (30%). We use these documents as examples of "scientific communication", characterised by a precise and compact style, with an abundance of technical terminology.
The IDMT Audio Phylogeny Dataset contains audio phylogeny trees for evaluation of audio phylogeny algorithms. It includes two different sets of phylogeny trees with 60 trees each, where every tree contains 20 nodes (audio files). The main difference between these two sets is in the set of transformations T used to create near duplicates in the set.
ODSS is a multilingual, multispeaker dataset of synthetic and natural speech, designed to foster research and benchmarking of novel studies on synthetic speech detection. ODSS comprises audio utterances generated from text by state-of-the-art synthesis methods, paired with their corresponding natural counterparts. The synthetic audio data includes several languages, with an equal representation of genders.
The ElecDeb60To16-fallacy dataset collects televised debates of the presidential election campaigns in the U.S. from 1960 to 2016. To the best of our knowledge, it is the biggest available dataset of political debates annotated with both argument components (evidence, claim) and relations (support, attack).
This repository contains a diverse set of features extracted from the marine video (underwater) dataset (MVK) . These features were utilized in the VISIONE system during the latest editions of the Video Browser Showdown (VBS) competition (https://www.videobrowsershowdown.org/). A snapshot of the MVK dataset from 2023, that can be downloaded using the instructions provided at https://download-dbis.dmi.unibas.ch/mvk/ was used. It comprises 1,372 video files, each divided into 1 second segments.
This repository contains a diverse set of features extracted from the VBSLHE dataset (laparoscopic gynecology), to be utilized in the VISIONE system in the Video Browser Showdown (VBS) competition (https://www.videobrowsershowdown.org/). A snapshot of the dataset provided by the Medical University of Vienna and Toronto that can be downloaded using the instructions provided at https://download-dbis.dmi.unibas.ch/mvk/ was used. It comprises 75 video files each divided into video shots with a maximum duration of 5 seconds.
This repository contains a set of features extracted from the V3C1+V3C2 dataset, sourced from the Vimeo Creative Commons Collection. These features were utilized in the VISIONE system during the latest editions of the Video Browser Showdown (VBS) competition (https://www.videobrowsershowdown.org/). The original V3C1+V3C2 dataset, provided by NIST, can be downloaded using the instructions provided at https://videobrowsershowdown.org/about-vbs/existing-data-and-tools/. It comprises 7,235 video files, amounting for 2,300h of video content and encompassing 2,508,113 predefined video segments. The predefined video segments longer than 10 seconds were divided into multiple segments, with each segment spanning no longer than 16 seconds. As a result, the dataset contains a total of 2,648,219 segments. For each segment, several features were extracted from the middle frame.
The data set provides annotations per image for training classifiers for bustle (i.e., more or less populated) and cinematographic shot type (from extreme close-up to extreme long shot). The annotations ar provided for images of the Places365-Standard training set, an have been generated using an automatic pipeline. 100 images per class are provided as validation set, and the annotations of these images have been manually checked and corrected.
The dataset provides training and valiation data for classifying images by time of day and season (time of year). The images are taken from the Skyfinder dataset, containing webcam images along with timestamps and geolocation. The annotations have been automatically derived from the metadata, which is sufficiently precise for season. For time of day, the annotation have been manually checked and corrected. The dataset contains 2,790 training files and 311 validation files per class for season, and 986 training files and 110 validation files per class for time of day.
This data set comprises a labelled training set used in the experimentation of the paper "Binary Quantification and Dataset Shift: An Experimental Investigation". The data is extracted from the McAuley data set of product reviews on Amazon.
The dataset contains 130 155 articles sourced from the websites of three Swiss francophone newspapers: Arc Info, La Cote, and Le Nouvelliste, spanning the time period from 01/01/2015 to 30/06/2022. The dataset, compiled from the temporary data feeds provided by the press agency, consists of the articles in their entirety including the title, headline, and content, along with metadata for each article. The collected articles are primarily in French language and are categorically sorted by topics and region, everything encoded in the JSON format.
WildCapture comprises more than 60k wild animal shots, from 15 different wildlife species, taken via phototraps in varying lighting conditions. The dataset is fully annotated with animal bounding boxes and species, with train, validation and ood splits.
AI4Media may use cookies to store your login data, collect statistics to optimize the website’s functionality and to perform marketing actions based on your interests. You can personalize your cookies in .