RI: Small: Visual Reasoning and Self-questioning for Explainable Visual Question Answering

Project: Research project

Project Details


Visual question answering (VQA) aims to answer a question in natural language related to a given im-age. Despite many recent efforts, its research is still in its infancy. Existing approaches forcefully learns the statistical correlation between visual data, questions and answers. Missing explicit reasoning, they lack flexibility and generalizability in handling versatile questions untrained. It is desirable to study explain-able VQA (X-VQA) that also provides explanation of its reasoning in addition to the answers. It needs to integrate computer vision, natural language, and knowledge representation.
However, X-VQA is a very ambitious and challenging task, as each component has its own hurdle and unsolved fundamental problems. For example, the visual observations are never complete and accurate, and X-VQA needs to deal with inaccurate and incomplete visual facts for reasoning. In addition, it is difficult to integrate the uncertainty in visual observation and domain knowledge. Moreover, it is costly to collect and annotate VQA training data with good coverage. All these confront the advance of X-VQA.
Effective start/end date10/1/209/30/23


  • National Science Foundation (IIS-2007613)


Explore the research topics touched on by this project. These labels are generated based on the underlying awards/grants. Together they form a unique fingerprint.