Towards Intelligent Agents that Learn by Multimodal Communication

Project: Research project

Project Details


We propose to explore how to create intelligent agents that learn by multimodal communication, in order to perform commonsense reasoning. People commonly communicate with each other using coordinated modalities, such as sketching and talking, or reading texts illustrated with diagrams. Our AI systems need the same capabilities. We propose to explore how to do fluent multimodal communication in the context of knowledge capture, to support commonsense reasoning. Commonsense reasoning is crucial for intelligent systems because it is part of the shared background assumed in working with human partners, and provides a foundation for future learning. Our hypotheses are: (1) qualitative representations are a crucial part of commonsense knowledge and (2) analogical reasoning and learning provides robustness in reasoning, plus human-like learning of complex relational structures. Unlike deep learning systems, for example, analogical learning systems can handle relational structures such as arguments, proofs, and plans, while learning with orders of magnitude less data. This research should help pave the way for intelligent systems that can interact with, and learn from, people using natural modalities, as well as make progress on understanding the nature of human cognition.
Using the Companion cognitive architecture, we propose to explore the following ideas: (1) Hybrid Primal Sketch. Our CogSketch system provides a model of high-level human vision that has been used both to model multiple human visual problem-solving tasks and in deployed sketch-based educational software. We propose to build a hybrid primal sketch processor, which combines CogSketch, off-the-shelf computer vision algorithms, and deep learning recognition systems, to process images, especially diagrams. (2) Analogical Learning of Narrative Function. Our prior work on analogical question-answering has led to algorithms that provide competitive performance on several datasets, while being more data-efficient than today’s machine learning systems. In this project we propose to extend these ideas to learning narrative functions, i.e. the higher levels of semantic interpretation that ascribe purpose relative to larger tasks to pieces of text. Building on observations of how people learn to read, we plan to build dialogue models for natural annotation, i.e. ways that trainers can teach systems how to interpret multimodal materials, to bootstrap them in a data-efficient manner.
We will test these ideas using a variety of materials, including Navy training materials and existing corpora and datasets. Elementary school science concerns commonsense reasoning about the physical and biological realms, so we also plan to use the Allen Institute for Artificial Intelligence tests and materials to evaluate our progress.
Effective start/end date6/1/205/31/23


  • Office of Naval Research (N00014-20-1-2447)


Explore the research topics touched on by this project. These labels are generated based on the underlying awards/grants. Together they form a unique fingerprint.