Discriminative multimodal learning via conditional priors in generative models

Andrade Mancisidor, Rogelio; Kampffmeyer, Michael Christian; Aas, Kjersti; Jenssen, Robert

Publisert versjon (PDF)

Dato

2023-11-02

Type

Journal article
Tidsskriftartikkel
Peer reviewed

Forfatter

Andrade Mancisidor, Rogelio; Kampffmeyer, Michael Christian; Aas, Kjersti; Jenssen, Robert

Sammendrag

Deep generative models with latent variables have been used lately to learn joint representations and generative processes from multi-modal data, which depict an object from different viewpoints. These two learning mechanisms can, however, conflict with each other and representations can fail to embed information on the data modalities. This research studies the realistic scenario in which all modalities and class labels are available for model training, e.g. images or handwriting, but where some modalities and labels required for downstream tasks are missing, e.g. text or annotations. We show, in this scenario, that the variational lower bound limits mutual information between joint representations and missing modalities. We, to counteract these problems, introduce a novel conditional multi-modal discriminative model that uses an informative prior distribution and optimizes a likelihood-free objective function that maximizes mutual information between joint representations and missing modalities. Extensive experimentation demonstrates the benefits of our proposed model, empirical results show that our model achieves state-of-the-art results in representative problems such as downstream classification, acoustic inversion, and image and annotation generation.

Forlag

Elsevier

Sitering

Andrade Mancisidor, Kampffmeyer, Aas, Jenssen. Discriminative multimodal learning via conditional priors in generative models. Neural Networks. 2023;169

Metadata

Vis full innførsel

Samlinger

Artikler, rapporter og annet (fysikk og teknologi) [1062]

Med mindre det står noe annet, er denne innførselens lisens beskrevet som Attribution 4.0 International (CC BY 4.0)