VL

Abstract

With the introduction of vision and language founcational models, language guidance shown to be powerful in many computer vision applications. However, aligning images and text is not straightforward due to the intrinsic differences in each modality. In this project, we develop new approaches to align vision and language modalities while preserving the modality-specific information wherever applicable.

Publications

Contrastive Difference Alignment Between Vision and Language for Composed Image Retrieval.
Y. Duan, S. Ramasinghe, A. Long, S. Gould, and T. Ajanthan.
Preprint, 2024.

Scaling Prompt Instructed Zero Shot Composed Image Retrieval with Image-Only Data.
Y. Duan, S. Ramasinghe, S. Gould, and T. Ajanthan.
International Joint Conference on Neural Networks (IJCNN), July 2025.
[pdf] [arxiv] [bib]

@article{duan_instructcir_ijcnn25,
  author = {Duan, Yiqun, and Ramasinghe, Sameera, and Gould, Stephen and Ajanthan, Thalaiyasingam},
  title = {Scaling Prompt Instructed Zero Shot Composed Image Retrieval with Image-Only Data},
  journal = {IJCNN},
  year = {2025}
}

Accept the Modality Gap: An Exploration in the Hyperbolic Space.
S. Ramasinghe, V. Shevchenko, G. Avraham, and T. Ajanthan.
Computer Vision and Pattern Recognition (CVPR), June 2024. (highlight)
[pdf] [talk] [code] [bib]

@article{ramasinghe_atmg_cvpr24,
  author = {Ramasinghe, Sameera, and Shevchenko, Violetta, and Avraham, Gil, and Ajanthan, Thalaiyasingam},
  title = {Accept the Modality Gap: An Exploration in the Hyperbolic Space},
  journal = {CVPR},
  year = {2024}
}

Modality-Aware Adaptation of Contrastive Language-Image Models.
A. Long, T. Ajanthan, and A. van den Hengel.
ICLR Workshop: Mathematical and Empirical Understanding of Foundation Models, May 2023.
[pdf] [bib]

@article{long_mater_iclr23,
  author = {Long, Alexander, and Ajanthan, Thalaiyasingam, and van den Hengel, Anton},
  title = {Modality-Aware Adaptation of Contrastive Language-Image Models},
  journal = {ICLR Workshop: Mathematical and Empirical Understanding of Foundation Models},
  year = {2023}
}

Vision and Language Alignment

Abstract

Publications