AtheneForschung - Research Portal of the UniBw M

Home / Alle InhaltePublikationen (Universitätsbibliografie)Fakultäten (HAW)Fakultät für Elektrotechnik und Technische InformatikETTI 2 - Institut für Verteilte Intelligente Systeme

Back
Back to start of result list
Permanent link for displayed object

If you experience problems opening the document, please try this link.

Authors:

Rösch, Philipp J.; Libovický, Jindřich

Document type:

Konferenzbeitrag / Conference Paper

Title:

Probing the Role of Positional Information in Vision-Language Models

Title of conference publication:

Findings of the Association for Computational Linguistics: NAACL 2022

Conference title:

Conference of the North American Chapter of the Association for Computational Linguistics (2022, Seattle, WA)

Venue:

Seattle, WA, United States

Year of conference:

2022

Date of conference beginning:

10.07.2022

Date of conference ending:

15.07.2022

Publisher:

Association for Computational Linguistics (ACL)

Year:

2022

Pages from - to:

1031-1041

Language:

Englisch

Abstract:

In most Vision-Language models (VL), the understanding of the image structure is enabled by injecting the position information (PI) about objects in the image. In our case study of LXMERT, a state-of-the-art VL model, we probe the use of the PI in the representation and study its effect on Visual Question Answering. We show that the model is not capable of leveraging the PI for the image-text matching task on a challenge set where only position differs. Yet, our experiments with probing confirm... »

URL:

https://aclanthology.org/2022.findings-naacl.77/

Department:

Fakultät für Elektrotechnik und Technische Informatik

Institute:

ETTI 2 - Institut für Verteilte Intelligente Systeme

Chair:

Oswald, Norbert

Open Access yes or no?:

Ja / Yes