Abstract
With the introduction of vision and language founcational models, language guidance shown to be powerful in many computer vision applications. However, aligning images and text is not straightforward due to the intrinsic differences in each modality. In this project, we develop new approaches to align vision and language modalities while preserving the modality-specific information wherever applicable.
Publications
Contrastive Difference Alignment Between Vision and Language for Composed Image Retrieval.
Y. Duan, S. Ramasinghe, A. Long, S. Gould, and T. Ajanthan.
Preprint, 2024.
Accept the Modality Gap: An Exploration in the Hyperbolic Space.
S. Ramasinghe, V. Shevchenko, G. Avraham, and
T. Ajanthan.
Computer Vision and Pattern Recognition (CVPR), June 2024.
(highlight)
[
pdf]
[
talk]
[
code]
[
bib]
@article{ramasinghe_atmg_cvpr24,
author = {Ramasinghe, Sameera, and Shevchenko, Violetta, and Avraham, Gil, and Ajanthan, Thalaiyasingam},
title = {Accept the Modality Gap: An Exploration in the Hyperbolic Space},
journal = {CVPR},
year = {2024}
}
Modality-Aware Adaptation of Contrastive Language-Image Models.
A. Long,
T. Ajanthan, and A. van den Hengel.
ICLR Workshop: Mathematical and Empirical Understanding of Foundation Models, May 2023.
[
pdf]
[
bib]
@article{long_mater_iclr23,
author = {Long, Alexander, and Ajanthan, Thalaiyasingam, and van den Hengel, Anton},
title = {Modality-Aware Adaptation of Contrastive Language-Image Models},
journal = {ICLR Workshop: Mathematical and Empirical Understanding of Foundation Models},
year = {2023}
}