Check out our Software
There are no public reports available yet.
InDistill enchances the effectiveness of the Knowledge Distillation procedure by leveraging the properties of channel pruning to both reduce the capacity gap between the models and retain the information geometry. Also, this method introduces a curriculum learning based scheme for enhancing the effectiveness of transferring knowledge from multiple intermediate layers.
Keywords
pygrank is an open source framework to define, run and evaluate node ranking algorithms. It provides object-oriented and extensively unit-tested algorithmic components, such as graph filters, post-processors, measures, benchmarks, and online tuning. Computations can be delegated to numpy, tensorflow, or pytorch backends and fit in back-propagation pipelines. Classes can be combined to define interoperable complex algorithms.
Keywords
We propose a new hyperbolic-based model for metric learning. At the core of our method is a vision transformer with output embeddings mapped to hyperbolic space. These embeddings are directly optimized using modified pairwise cross-entropy loss.
Keywords
We introduce a new setting of Novel Class Discovery in Semantic Segmentation (NCDSS), which aims at segmenting unlabeled images containing new classes given prior knowledge from a labeled set of disjoint classes. In NCDSS, we need to distinguish the objects and background, and to handle the existence of multiple classes within an image, which increases the difficulty in using the unlabeled data. To tackle this new setting, we leverage the labeled base data and a saliency model to coarsely cluster novel classes for model training in our basic framework.
Keywords
We propose a simple and novel Unsupervised Domain Adaptation (UDA) approach for video action recognition. Our approach leverages recent advances on spatio-temporal transformers to build a robust source model that better generalises to the target domain. Furthermore, our architecture learns domain invariant features thanks to the introduction of a novel alignment loss term derived from the Information Bottleneck principle.
Keywords
We propose an augmentation-free unsupervised approach for point clouds to learn transferable point-level features via soft clustering, named SoftClu. SoftClu assumes that the points belonging to a cluster should be close to each other in both geometric and feature spaces. We exploit the affiliation of points to their clusters as a proxy to enable self-training through a pseudo-label prediction task. Under the constraint that these pseudo-labels induce the equipartition of the point cloud, we cast SoftClu as an optimal transport problem.
Keywords
"We propose the Style-HAllucinated Dual consistEncy learning (SHADE) framework constructed based on two consistency constraints, Style Consistency (SC) and Retrospection Consistency (RC). SC enriches the source situations and encourages the model to learn consistent representation across style-diversified samples. RC leverages real-world knowledge to prevent the model from overfitting to synthetic data and thus largely keeps the representation consistent between the synthetic and real-world models. Furthermore, we present a novel style hallucination module (SHM) to generate style-diversified samples that are essential to consistency learning."
Keywords
We address the new task of class-incremental Novel Class Discovery (class-iNCD), which refers to the problem of discovering novel categories in an unlabelled data set by leveraging a pre-trained model that has been trained on a labelled data set containing disjoint yet related categories. Apart from discovering novel classes, we also aim at preserving the ability of the model to recognize previously seen base categories.
Keywords
This is a Pytorch implementation of Hebbian learning algorithms to train deep convolutional neural networks.
Keywords
Neuronal network Models trained with the updated version (v2) of the PNN and PV datasets able to count perineural nets
Keywords
FeTrIL: Feature Translation for Exemplar-Free Class-Incremental Learning. We introduce a method which combines a fixed feature extractor and a pseudo-features generator to improve the stability-plasticity balance. The generator uses a simple yet effective geometric translation of new class features to create representations of past classes, made of pseudo-features. The translation of features only requires the storage of the centroid representations of past classes to produce their pseudo-features. Actual features of new classes and pseudo-features of past classes are fed into a linear classifier which is trained incrementally to discriminate between all classes. The incremental process is much faster with the proposed method compared to mainstream ones which update the entire deep model.
Keywords
We propose the manifold mixing model soup (ManifoldMixMS) algorithm. Instead ofsimple averaging, it uses a more sophisticated strategy to generate the fused model. Specifically, it partitionsa neural network model into several latent space manifolds (which can be individual layers or a collection oflayers). Afterwards, from the pool of finetuned models available after hyperparameter tuning, the most promisingones are selected and their latent space manifolds are mixed together individually. The optimal mixing coefficientfor each latent space manifold is calculated automatically via invoking an optimization algorithm. The fusedmodel we retrieve with this procedure can be thought as sort of a "Frankenstein" model, as it integrates (parts of)individual model components from multiple finetuned models into one mod
Keywords
We propose a method to model latent structures with a learned dynamic potential landscape, thereby performing latent traversals as the flow of samples down the landscape's gradient. Inspired by physics, optimal transport, and neuroscience, these potential landscapes are learned as physically realistic partial differential equations, thereby allowing them to flexibly vary over both space and time. To achieve disentanglement, multiple potentials are learned simultaneously, and are constrained by a classifier to be distinct and semantically self-consistent.
Keywords
We propose an architecture-agnostic approach that jointly discovers factors representing spatial parts and their appearances in an entirely unsupervised fashion. These factors are obtained by applying a semi-nonnegative tensor factorization on the feature maps, which in turn enables context-aware local image editing with pixel-level control. In addition, we show that the discovered appearance factors correspond to saliency maps that localize concepts of interest, without using any labels. Experiments on a wide range of GAN architectures and datasets show that, in comparison to the state of the art, our method is far more efficient in terms of training time and, most importantly, provides much more accurate localized control.
Keywords
We tackle the neural face reenactment task by leveraging the photorealistic image generation and the disentangled properties of a pretrained StyleGAN2, along with a hypernetwork. We present a novel method that performs both faithful identity reconstruction and effective facial image editing by learning to update the weights of a StyleGAN2 generator using a hypernetwork approach. Specifically, our model effectively combines the appearance features of a source image and the facial pose features of a target image to create new facial images that preserve the source identity and convey the target facial pose.
Keywords
We propose a task-agnostic anonymization procedure that directly optimises the images' latent representation in the latent space of a pre-trained GAN. By optimizing the latent codes directly, we ensure both that the identity is of a desired distance away from the original (with an identity obfuscation loss), whilst preserving the facial attributes (using a novel feature-matching loss in FaRL's deep feature space). The method is capable of anonymizing the identity of the images whilst better-preserving the facial attributes.
Keywords
We propose a framework that, using unpaired randomly generated facial images, learns to disentangle the identity characteristics of the face from its pose by incorporating the recently introduced style space S of StyleGAN2, a latent representation space that exhibits remarkable disentanglement properties. By capitalizing on this, we learn to successfully mix a pair of source and target style codes using supervision from a 3D model. The resulting latent code, that is subsequently used for reenactment, consists of latent units corresponding to the facial pose of the target only and of units corresponding to the identity of the source only, leading to notable improvement in the reenactment performance.
Keywords
Distracted driver classification (DDC) plays an important role in ensuring driving safety. This software related to the article "100-Driver: A Large-scale, Diverse Dataset for Distracted Driver Classification" published in IEEE Trans. on Intelligent Transportation Systems. The code allows investigating practical problems of DDC, including the traditional setting without domain shift and 3 challenging settings (i.e., cross-modality, cross-view, and cross-vehicle) with domain shifts.
Keywords
The yamlres library for retrieving algorithm component combinations from online or local yaml resources to enable distributed development of no-learning schemas.
Keywords
Functional implementation of graph learning algorithms that support fast experimentation with autotuning algorithms and sharing yamlres definitions of autotuned algorithms.
Keywords
A holistic learning framework for Novel Class Discovery (NCD), which adopts contrastive learning to learn discriminate features with both the labeled and unlabeled data. The Neighborhood Contrastive Learning (NCL) framework effectively leverages the local neighborhood in the embedding space, enabling us to take the knowledge from more positive samples and thus improve the clustering accuracy. In addition, we also introduce the Hard Negative Generation (HNG), which leverages the labeled samples to produce informative hard negative samples and brings further advantage to NCL.
Keywords
PyTorch implementation of a Geometry-Contrastive Transformer for Generalized 3D Pose Transfer. The novel GC-Transformer can freely conduct robust pose transfer on LARGE meshes at no cost, which could be a boost to Transformers in 3D fields.
Keywords
A novel unsupervised domain adaptation approach for action recognition from videos, inspired by recent literature on contrastive learning. It comprises a novel two-headed deep architecture that simultaneously adopts cross-entropy and contrastive losses from different network branches to robustly learn a target classifier.
Keywords
PyTorch implementation of AniFormer, a novel Transformer-based architecture, that generates animated 3D sequences by directly taking the raw driving sequences and arbitrary same-type target meshes as inputs. The Transformer architecture is customised for 3D animation that generates mesh sequences by integrating styles from target meshes and motions from the driving meshes. Besides, instead of the conventional single regression head in the vanilla Transformer, AniFormer generates multiple frames as outputs to preserve the sequential consistency of the generated meshes. This is achieved by a pair of regression constraints, i.e., motion and appearance constraints, that can provide strong regularization on the generated mesh sequences.
Keywords
PyTorch implementation of Intrinsic-Extrinsic Preserved Generative Adversarial Network (IEP-GAN) for both intrinsic (i.e., shape) and extrinsic (i.e., pose) information preservation. Extrinsically, a co-occurrence discriminator is used to capture the structural/pose invariance from distinct Laplacians of the mesh. Intrinsically, a local intrinsic-preserved loss is introduced to preserve the geodesic priors while avoiding heavy computations. IEP-GAN can be sued to manipulate 3D human meshes in various ways, including pose transfer, identity swapping and pose interpolation with latent code vector arithmetic. The extensive experiments on various 3D datasets of humans, animals and hands demonstrate the generality of this approach.
Keywords
Code for Word-Class Embeddings (WCEs), a form of supervised embeddings especially suited for multiclass text classification. WCEs are meant to be used as extensions (i.e., by concatenation) to pre-trained embeddings (e.g., GloVe or word2vec) embeddings in order to improve the performance of neural classifiers.
Keywords
Graph Neural Networks (GNNs) have seen a dramatic increase in popularity thanks to their ability to understand relations between graph nodes. This library aims to provide GNN capabilities to native Java applications, for example, to perform machine learning on Android. It does so by avoiding c-based machine learning libraries, such as TensorFlow Lite, that are often designed with pure performance in mind but which often require specific hardware to run, such as GPUs, and drastically increase the size of deployed applications.
Keywords
A package for implementing and simulating decentralized Graph Neural Network algorithms for the classification of peer-to-peer nodes.
Keywords
A framework for easy experimentation with Graph Neural Network (GNN) architectures by separating them from predictive components.
Keywords
Python implementation of mini-batch trimming, a novel strategy for improving the generalization capability of a trained network model. It is easy to implement and add to a training pipeline and independent of the employed model and optimizer.
Keywords
A wrapper for several SoA adaptive-gradient optimizer (Adam/AdamW/EAdam/AdaBelief/AdaMomentum/AdaFamily), including our novel 'AdaFamily' optimizer, via one API.
Keywords
The ability of artificial agents to increment their capabilities when confronted with new data is an open challenge in artificial intelligence. The main challenge faced in such cases is catastrophic forgetting, i.e., the tendency of neural networks to underfit past data when new ones are ingested. The repository includes implementations of several incremental learning techniques including among others LUCIR, iCaRL, BiC, LwF, REMIND, Deep-SLDA, ScaIL, IL2M, DeeSIL, FT, and SIW.
Keywords
CNN-based algorithm for traffic density estimation and counting that can generalize to new data sources for which there are no annotations available. This generalization is achieved by exploiting an Unsupervised Domain Adaptation strategy, whereby a discriminator attached to the output forces similar density distribution in the target and source domains.
Keywords
QuaPy is an open-source framework for quantification (a.k.a. supervised prevalence estimation, or learning to quantify) written in Python. QuaPy provides implementations of the most important aspects of the quantification workflow, such as (baseline and advanced) quantification methods, quantification-oriented model selection mechanisms, evaluation measures, and evaluation protocols used for evaluating quantification methods. QuaPy also makes available commonly used datasets, and offers visualization tools for facilitating the analysis and interpretation of the experimental results. QuaPy is accompanied by rich API documentation and a wiki guide. The software is open-source, and distributed under the BSD-3 license; it is available on GitHub and can be installed via pip.
Keywords
ql4facct is a software for replicating experiments concerning the evaluation of estimators of classifier "fairness". This repository makes available baseline systems used in literature, along with our proposed framework based on quantification. The experiments implemented in this software show, through four different experimental protocols and with the aid of visualization tools, that estimating classifier fairness via quantification yields a clear advantage with respect to the previous state-of-the-art.
Keywords
Novel fixed classifier for incremental learning in which a number of pre-allocated output nodes are subject to the classification loss right from the beginning of the learning phase. Contrarily to the standard expanding classifier, this allows: (a) the output nodes of future unseen classes to firstly see negative samples since the beginning of learning together with the positive samples that incrementally arrive; (b) to learn features that do not change their geometric configuration as novel classes are incorporated in the learning model.
Keywords
Discovery of non-linear interpretable paths in GAN latent space in an unsupervised and model-agnostic manner. Non-linear paths are modeled using RBF-based warping functions, optimized in order to be distinguishable from each other. This leads to paths that correspond to an interpretable generation where only a small number of generative factors are affected for each path. A quantitative evaluation protocol for the case of face-generating GANs is also implemented, which can be used to automatically associate the discovered paths with interpretable attributes such as smiling and rotation.
Keywords
A method that offers an intuitive way to find different types of interpretable transformations in a pre-trained GAN. We achieve this by decomposing the generator’s activations in a multilinear manner and regressing back to the latent space.
Keywords
PyTorch implementation of Multi-target Graph Domain Adaptation framework. The framework is pivoted around two key concepts: graph feature aggregation and curriculum learning.
Keywords
PyTorch implementation of the Memory-based Multi-Source MetaLearning (M^3L) framework for multi-source domain generalization (DG) in person ReID. The proposed meta-learning strategy enables the model to simulate the train-test process of DG during training, which can efficiently improve the generalization ability of the model on unseen domains. A memory-based module and MetaBN are also introduced to take full advantage of meta-learning and obtain further improvement.
Keywords
Python code for Generalised Funnelling. Funneling is a new ensemble method for heterogeneous transfer learning that can be applied to cross-lingual text classification. Funneling consists of generating a two-tier classification system where all documents, irrespective of language, are classified by the same (second-tier) classifier. For this classifier, all documents are represented in a common, language-independent feature space consisting of the posterior probabilities generated by first-tier, language-dependent classifiers. This allows the classification of all test documents, of any language, to benefit from the information present in all training documents, of any language.
Keywords
A library of self-supervised methods for unsupervised visual representation learning powered by PyTorch Lightning. It aims at providing SotA self-supervised methods in a comparable environment while, at the same time, implementing training tricks. While the library is self-contained, it is possible to use the models outside of solo-learn.
Keywords
This repository hosts the code and data lists for our two learning-based eXplainable AI (XAI) methods called L-CAM-Fm and L-CAM-Img, for deep convolutional neural networks (DCNN) image classifiers. Our methods receive as input an image and a class label and produce as output the image regions that the DCNN has focused on in order to infer this class. Both methods use an attention mechanism (AM), trained end-to-end along with the original (frozen) DCNN, to derive class activation maps (CAMs) from the last convolutional layer’s feature maps (FMs).
Keywords
W introduce a novel universal attack algorithm called ``MetaAttack'' for person re-ID. MetaAttack can mislead re-ID models on unseen domains by a universal adversarial perturbation. Specifically, to capture common patterns across different domains, we propose a meta-learning scheme to seek the universal perturbation via the gradient interaction between meta-train and meta-test formed by two datasets. We also take advantage of a virtual dataset (PersonX), instead of real ones, to conduct meta-test. This scheme not only enables us to learn with more comprehensive variation factors but also mitigates the negative effects caused by biased factors of real datasets.
Keywords
PyTorch code for our submission: "Logit Margin Matters: Improving Transferable Targeted Adversarial Attack by Logit Calibration". The code is implemented based on the Code of the paper "On Success and Simplicity: A Second Look at Transferable Targeted Attacks" (Zhengyu Zhao, Zhuoran Liu, Martha Larson, NeurIPS 2021).
Keywords
Addressing the problem of removing any client’s contribution in federated learning (FL). During FL rounds, each client performs local training to learn a model that minimizes the empirical loss on their private data.We propose to perform unlearning at the client (to be erased) by reversing the learning process, i.e., training a model to maximize the local empirical loss. In particular, we formulate the unlearning problem as a constrained maximization problem by restricting to an l2-norm ball around a suitably chosen reference model to help retain some knowledge learnt from the other clients’ data. This allows the client to use projected gradient descent to perform unlearning. The method does neither requires global access to the data used for training nor the history of the parameter updates to be stored by the aggregator (server) or any of the clients.
Keywords
This repository contains the code for the paper "Matching Pairs: Attributing Fine-Tuned Models to their Pre-Trained Large Language Models". By casting the LLM attribution as a classification problem, we develop machine learning solutions that link a fine-tuned LLM to its pre-trained base model.
Keywords
Adversarial Robustness Toolbox (ART) is a Python library for Machine Learning Security. ART provides tools that enable developers and researchers to defend and evaluate Machine Learning models and applications against the adversarial threats of Evasion, Poisoning, Extraction, and Inference. ART supports all popular machine learning frameworks (TensorFlow, Keras, PyTorch, MXNet, scikit-learn, XGBoost, LightGBM, CatBoost, GPy, etc.), all data types (images, tables, audio, video, etc.) and machine learning tasks (classification, object detection, speech recognition, generation, certification, etc.).
Keywords
Novel training-time attacks resulting in corrupted Deep Generative Models (DGMs) that synthesize regular data under normal operations and designated target outputs for inputs sampled from a trigger distribution. Depending on the control that the adversary has over the random number generation, this imposes various degrees of risk that harmful data may enter the machine learning development pipelines, potentially causing material or reputational damage to the victim organization. The attacks are based on adversarial loss functions that combine the dual objectives of attack stealth and fidelity. Its effectiveness is shown for a variety of DGM architectures like StyleGANs and WaveGANs.
Keywords
Repository with the main tools for computing Regression Concept Vectors.
Keywords
Python implementation of OBjectGraphs, a new approach for video event recognition that exploits the relations among objects within each frame. More specifically, a graph, constructed using the appearance features of the objects, is exploited by the model to recognize the video event. Moreover, using the weighted in-degrees of the graph’s adjacency matrix, the model is able to provide insightful explanations for its decisions.
Keywords
Multi-task and Adversarial CNN Training: Learning Interpretable Pathology Features Improves CNN Generalization
Keywords
Privacy-preserving, architecture-agnostic GNN learning algorithm with formal privacy guarantees based on Local Differential Privacy (LDP). This includes a multidimensional ε-LDP algorithm that allows the server to privately collect node features and estimate the first-layer graph convolution of the GNN using the noisy features. Then, to further decrease the estimation error, we introduce KProp, a simple graph convolution layer that aggregates features from higher-order neighbors, which is prepended to the backbone GNN.
Keywords
Diffprivlib is a general-purpose library developed by IBM for experimenting with, investigating, and developing applications in, differential privacy: - Experiment with differential privacy - Explore the impact of differential privacy on machine learning accuracy using classification and clustering models - Build your own differential privacy applications, using an extensive collection of mechanisms
Keywords
Prototype of the AI4Media Evaluation as a Service platform. This platform is derived from the open-source Codalab EaaS platform and contains specific functions adapted for the AI4Media project, as well as an appropriate use-case scenario. This is a prototype version of the platform and will be updated as the project continues.
Keywords
We propose TransDepth, an architecture that benefits from both convolutional neural networks and transformers. To avoid the network losing its ability to capture local level details due to the adoption of transformers, we propose a novel decoder that employs attention mechanisms based on gates. Notably, this is the first solution that applies transformers to pixel-wise prediction problems involving continuous labels (i.e., monocular depth prediction and surface normal estimation).
Keywords
We propose two efficient variants to compute the differentiable matrix square root. For the forward propagation, one method is to use Matrix Taylor Polynomial (MTP), and the other method is to use Matrix Padé Approximants (MPA). The backward gradient is computed by iteratively solving the continuous-time Lyapunov equation using the matrix sign function. Both methods yield considerable speed-up compared with the SVD or the Newton-Schulz iteration.
Keywords
Inserting an SVD meta-layer into neural networks is prone to make the covariance ill-conditioned, which could harm the model in the training stability and generalization abilities. We systematically study how to improve the covariance conditioning by enforcing orthogonality to the Pre-SVD layer.
Keywords
EigenDecomposition (ED) is at the heart of many computer vision algorithms and applications. One crucial bottleneck limiting its usage is the expensive computation cost, particularly for a mini-batch of matrices in the deep neural networks. We propose a QR-based ED method dedicated to the application scenarios of computer vision. Our proposed method performs the ED entirely by batched matrix/vector multiplication, which processes all the matrices simultaneously and thus fully utilizes the power of GPUs.
Keywords
This software can be used for training a deep learning architecture which estimates frames' importance by integrating a concentrated attention mechanism and utilising information about the frames' uniqueness and diversity. The integrated mechanism is able to focus on non-overlapping blocks in the main diagonal of the attention matrix and make better estimates about the significance of different parts of the video by considering the uniqueness and diversity of the associated frames.
Keywords
This software can be used for studying our method for producing explanations for the outcomes of various attention-based video summarization models, and re-producing the reported exprerimental results in our papers titled "A Study on the Use of Attention for Explaining Video Summarization" (published in the Proc. of the IEEE Int. Symposium on Multimedia (ISM) 2022) and "Explaining Video Summarization Based on the Focus of Attention" (published in the Proc. of the NarSUM workshop at ACM Multimedia 2023).
Keywords
DivClust is a method for controlling inter-clustering diversity in deep clustering frameworks. It consists of a novel loss that can be incorporated in most modern deep clustering frameworks in a straightforward way during their training, and which allows the user to specify their desired degree of inter-clustering diversity, which is then enforced in the form of an upper bound threshold.
Keywords
Code for the training of a video similarity learning network with self-supervision.
Keywords
Code for the knowledge distillation training of coarse- and fine-grained student networks based on similarities calculated from a teacher and the selector network. Also, the scripts for the training of the selector network are included.
Keywords
We propose an adaptive method that introduces soft inter-sample relations, namely Adaptive Soft Contrastive Learning (ASCL). More specifically, ASCL transforms the original instance discrimination task into a multi-instance soft discrimination task, and adaptively introduces inter-sample relations. As an effective and concise plug-in module for existing self-supervised learning frameworks, ASCL achieves the best performance on several benchmarks in terms of both performance and efficiency.
Keywords
A novel self-supervised framework: cross-context learning between global and hypercolumn features (CGH), that enforces the consistency of instance relations between low- and high-level semantics. Specifically, we stack the intermediate feature maps to construct a hypercolumn representation so that we can measure instance relations using two contexts (hypercolumn and global feature) separately, and then use the relations of one context to guide the learning of the other. This cross-context learning allows the model to learn from the differences between the two contexts.
Keywords
We propose a contrastive learning method, called Masked Contrastive learning (MaskCon) to address the under-explored problem setting, where we learn with a coarse-labelled dataset in order to address a finer labelling problem. More specifically, within the contrastive learning framework, for each sample our method generates soft-labels against other samples and another augmented view of the sample in question. By contrast to self-supervised contrastive learning where only the sample's augmentations are considered hard positives, and in supervised contrastive learning where only samples with the same coarse labels are considered hard positives, we propose soft labels based on sample distances, that are masked by the coarse labels. This allows us to utilize both inter-sample relations and coarse labels.
Keywords
We propose an efficient and robust framework named Sample Selection and Relabelling(SSR), that minimizes the number of modules and hyperparameters required, and that achieves good results in various conditions. In the heart of our method is a sample selection and relabelling mechanism based on a non-parametric KNN classifier and a parametric model classifier, respectively, to select the clean samples and gradually relabel the closed-set noise samples. Without bells and whistles, such as model co-training, self-supervised pertaining, and semi-supervised learning, and with robustness concerning settings of its few hyper-parameters, our method significantly surpasses previous methods.
Keywords
This software can be used for training a deep learning architecture for video thumbnail selection, which quantifies the representativeness and the aesthetic quality of the selected thumbnails using deterministic reward functions, and integrates a frame picking mechanism that takes frames' diversity into account. After being trained on a collection of videos, RL-DiVTS is capable of selecting a diverse set of representative and aesthetically-pleasing video thumbnails for unseen videos, according to a user-specified value about the number of required thumbnails.
Keywords
Python implementation of novel Cycle In Cycle Generative Adversarial Network (C2GAN) for the task of keypoint-guided image generation. The C2GAN is a cross-modal framework exploring joint exploitation of the keypoint and the image data in an interactive manner. C2GAN contains two different types of generators, i.e., keypoint-oriented generator and image-oriented generator. Both of them are mutually connected in an end-to-end learnable fashion and explicitly form three cycled sub-networks, i.e., one image generation cycle and two keypoint generation cycles. Each cycle not only aims at reconstructing the input domain, and also produces useful output involved in the generation of another cycle. By so doing, the cycles constrain each other implicitly, which provides complementary information from the two different modalities and brings extra supervision across cycles, thus facilitating more robust optimization of the whole network.
Keywords
Source code for the DVMS model and training procedure, as well as pre-trained network weights for reproducibility. This deep learning model allows for multiple trajectory predictions of head movements while experiencing 360° videos with a VR headset. The necessary libraries are bundled in a Docker image but can also be installed separately.
Keywords
Implementation of Fast SR-Net for fast video visual quality and resolution improvement. It comprises a GAN-based training procedure for obtaining a fast neural network that enables better bitrate performances with respect to the H.265 codec for the same quality, or better quality at the same bitrate.
Keywords
Novel framework for Playable Video Generation that is trained in a self-supervised manner on a large dataset of unlabelled videos. We employ an encoder-decoder architecture where the predicted action labels act as bottlenecks. The network is constrained to learn a rich action space using, as the main driving loss, a reconstruction loss on the generated video.
Keywords
This is a fork of Few-shot Object Detection (FsDet) (https://github.com/ucbdrive/few-shot-object-detection), adding an easy-to-use tool for training on custom datasets. We have extended the FsDet framework with a tool that dynamically generates datasets from annotation files and drives the training process. The tool has the following features: - Determine the base and novel classes from the provided annotations (for the novel classes only a subset may be used for training). - Determine how many instances are available, and set up the k-shot n-way problem accordingly. - Prepare model structures for the novel only and combined base+novel finetuning by adjusting the layer sizes to match the number of classes in the different sets. - If the number of samples strongly varies, set up multiple training problems to make the best use of the data, and run multiple fine-tuning steps.
Keywords
VISIONE is a content-based retrieval system that supports various search functionalities (text search, object/color-based search, semantic and visual similarity search, temporal search). It uses a full-text search engine as a search backend.
Keywords
Novel Deep Micro-Dictionary Learning and Coding Network (DDLCN). DDLCN has most of the standard deep learning layers (pooling, fully, connected, input/output, etc.) but the main difference is that the fundamental convolutional layers are replaced by novel compound dictionary learning and coding layers. The dictionary learning layer learns an over-complete dictionary for the input training data. At the deep coding layer, a locality constraint is added to guarantee that the activated dictionary bases are close to each other. Next, the activated dictionary atoms are assembled together and passed to the next compound dictionary learning and coding layers. In this way, the activated atoms in the first layer can be represented by the deeper atoms in the second dictionary. Intuitively, the second dictionary is designed to learn the fine-grained components which are shared among the input dictionary atoms. In this way, a more informative and discriminative low-level representation of the dictionary atoms can be obtained.
Keywords
The new loss function for self-supervised representation learning (SSL), is based on the whitening of the latent-space features. The whitening operation has a "scattering" effect on the batch samples, avoiding degenerate solutions where all the sample representations collapse to a single point. Our solution does not require asymmetric networks and it is conceptually simple. Moreover, since negatives are not needed, we can extract multiple positive pairs from the same image instance.
Keywords
A tool to allow Visual Transformers (VTs) to learn spatial relations within an image making the VT training much more robust when training data is scarce. The tool can be used jointly with the standard (supervised) training and it does not depend on specific architectural choices, thus it can be easily plugged into the existing VTs. Our method can improve (sometimes dramatically) the final accuracy of the VTs.
Keywords
As the backward algorithm of SVD is prone to numerical instability, we implement a variety of end-to-end SVD methods by manipulating the backward algorithms in this repository. They include: - SVD-Pad'e: use Pad'e approximants to closely approximate the gradient. - SVD-Taylor: use the Taylor polynomial to approximate the smooth gradient. - SVD-PI: use Power Iteration (PI) to approximate the gradients. - SVD-Newton: use the gradient of the Newton-Schulz iteration. - SVD-Trunc: set an upper limit of the gradient and apply truncation. - SVD-TopN: select the Top-N eigenvalues and abandon the rest. - SVD-Original: ordinary SVD with gradient overflow check.
Keywords
We propose a 3D-aware Semantic-Guided Generative Model (3D-SGAN) for human image synthesis, which combines a GNeRF with a texture generator. The former learns an implicit 3D representation of the human body and outputs a set of 2D semantic segmentation masks. The latter transforms these semantic masks into a real image, adding a realistic texture to the human appearance. Without requiring additional 3D information, our model can learn 3D human representations with a photo-realistic, controllable generation.
Keywords
We present a novel bipartite graph reasoning Generative Adversarial Network (BiGraphGAN) for two challenging tasks: person pose and facial image synthesis. The proposed graph generator consists of two novel blocks that aim to model the pose-to-pose and pose-to-image relations, respectively.
Keywords
We propose a novel edge guided generative adversarial network with contrastive learning (ECGAN) for the challenging semantic image synthesis task.
Keywords
We propose a new Attention-Guided Generative Adversarial Networks (AttentionGAN) for the unpaired image-to-image translation task. AttentionGAN can identify the most discriminative foreground objects and minimize the change of the background. The attention-guided generators in AttentionGAN are able to produce attention masks, and then fuse the generation output with the attention masks to obtain high-quality target images. Accordingly, we also design a novel attention-guided discriminator which only considers attended regions.
Keywords
We propose a novel framework, i.e., Predict, Prevent, and Evaluate (PPE), for disentangled text-driven image manipulation that requires little manual annotation while being applicable to a wide variety of manipulations.
Keywords
We propose a novel model named Multi-Channel Attention Selection Generative Adversarial Network (SelectionGAN) for guided image-to-image translation, where we translate an input image into another while respecting an external semantic guidance. The proposed SelectionGAN explicitly utilizes the semantic guidance information and consists of two stages. In the first stage, the input image and the conditional semantic guidance are fed into a cycled semantic-guided generation network to produce initial coarse results. In the second stage, we refine the initial results by using the proposed multi-scale spatial pooling & channel selection module and the multi-channel attention selection module. Moreover, uncertainty maps automatically learned from attention maps are used to guide the pixel loss for better network optimization. Exhaustive experiments on four challenging guided image-to-image translation tasks (face, hand, body, and street view) demonstrate that our SelectionGAN is able to generate significantly better results than the state-of-the-art methods. Meanwhile, the proposed framework and modules are unified solutions and can be applied to solve other generation tasks such as semantic image synthesis.
Keywords
We propose an implicit style function (ISF) to straightforwardly achieve multi-modal and multi-domain image-to-image translation from pre-trained unconditional generators. The ISF manipulates the semantics of an input latent code to make the image generated from it lying in the desired visual domain.
Keywords
A method for dealing with challenges that arise in the domain of affect and mental health in multi-label regression problems. We propose a two-stage attention architecture that uses features from the clips’ neighbourhood to introduce context information in the feature extraction. The architecture is novel in the domain of affect and mental state analysis and leads to smaller training times in comparison to state of the art. Furthermore, we introduced a novel relational regression loss that aims at learning from the label relationships of the samples during training. The proposed loss uses the distance between label vectors to learn intra-batch latent representation similarities in a supervised manner. The improvedlatent representations obtained with the addition of the relational regression loss lead to improved regression output, without the use of large datasets.
Keywords
A novel visual-language model called DFER-CLIP, which is based on the CLIP model and designed for in-the-wild Dynamic Facial Expression Recognition (DFER). Specifically, the proposed DFER-CLIP consists of a visual part and a textual part. For the visual part, based on the CLIP image encoder, a temporal model consisting of several Transformer encoders is introduced for extracting temporal facial expression features, and the final feature embedding is obtained as a learnable "class" token. For the textual part, we use as inputs textual descriptions of the facial behaviour that is related to the classes (facial expressions) that we are interested in recognising -- those descriptions are generated using large language models, like ChatGPT. This, in contrast to works that use only the class names and more accurately captures the relationship between them. Alongside the textual description, we introduce a learnable token which helps the model learn relevant context information for each expression during training.
Keywords
Novel two-stage framework with a new Cascaded Cross MLP-Mixer (CrossMLP) sub-network in the first stage and one refined pixel-level loss in the second stage. In the first stage, the CrossMLP sub-network learns the latent transformation cues between image code and semantic map code via our novel CrossMLP blocks. Then, the coarse results are generated progressively under the guidance of those cues. Moreover, in the second stage, we use a refined pixel-level loss that eases the noisy semantic label problem with more reasonable regularization in a more compact fashion for better optimization.
Keywords
DeepFusion source code. This code corresponds to a DNN-based late fusion approach, that uses a custom number of inducers as inputs and outputs a new result, according to late fusion schemes.
Keywords
Social networks give free access to their services in exchange for the right to exploit their users' data. Data sharing is done in an initial context which is chosen by the users. However, data are used by social networks and third parties in different contexts which are often not transparent. In order to unveil such usages, we propose an approach that focuses on the effects of data sharing in impactful real-life situations. Focus is put on visual content because of its strong influence in shaping online user profiles. The approach relies on three components: (1) a set of visual objects with associated situation impact ratings obtained by crowdsourcing, (2) a corresponding set of object detectors for mining users' photos and (3) a ground truth dataset made of 500 visual user profiles which are manually rated per situation. These components are combined in LERVUP, a method which learns to rate visual user profiles in each situation. LERVUP exploits a new image descriptor which aggregates object ratings and object detections at user level and an attention mechanism which boosts highly-rated objects to prevent them from being overwhelmed by low-rated ones. Performance is evaluated per situation by measuring the correlation between the automatic ranking of profile ratings and a manual ground truth
Keywords
AI4Media may use cookies to store your login data, collect statistics to optimize the website’s functionality and to perform marketing actions based on your interests. You can personalize your cookies in .