Learning by Multimodal Communication by Intelligent Agents

Project: Research project

Project Details


Our goal is to discover how to create intelligent agents that interact with human partners via language, sketching, and vision. Collaborative intelligent agents must have a shared background of commonsense knowledge and the ability to use it effectively. They must be able to extend their domain knowledge and communication skills incrementally to adapt to their collaborators, without requiring massive numbers of examples or hands-on interventions by AI experts. Our hypotheses are that this can be achieved (1) qualitative representations that provide a bridge between perception and cognition, support for commonsense reasoning, and a key component of natural language semantics, and (2) using analogical processing, where cognitive models of analogical matching, retrieval, and generalization provide incremental, inspectable data and training efficient learning. This proposal describes our next steps in exploring these hypotheses. Visual Understanding: The Hybrid Primal Sketch Processor (HPSP) that we built in our previous project combines CogSketch, off-the-shelf computer vision algorithms, and deep learning recognition components to process images. Its ability to perform competitively with deep learning systems, while being incremental, inspectable, and more data or training efficient, provides evidence of the productivity of this approach. We plan to build on this progress in three ways: (1) Currently the mix of components within the HPSP are chosen by hand for particular tasks. We plan to explore strategies for self-control of visual processing, where the system monitors its operations and uses those models to automatically adjust its processing to improve performance. (2) An exciting new frontier revealed by our prior project is hierarchical analogical learning, where analogical models learned at multiple levels of abstraction are combined to provide even better performance than traditional analogical learning. We plan to explore this idea further, both theoretically and empirically. (3) Our prior work showed that analogy can fuse understanding across modalities, and our advances on vision and language set the stage for directly tackling multimodal understanding of diagrams and scenes. Our idea is that a satisfactory visual understanding of an image in context should include explanations for all of the regions in it, and when combined with the understanding of the associated text, provide a model powerful enough to reason with. Language Understanding: Our two key advances in natural language understanding in our previous project were (1) a new approach to integrating qualitative representations in semantics via unified QP Frames as precursors to type-level or individual models, which are built based on context, and (2) initial experiments in hybridizing our natural language system, combining high-precision symbolic understanding with statistical evidence from a large language model. We plan to build on this progress in three ways: (1) extend our new unified frame representation to cover all of QP theory, including model fragments and scenario models, (2) experiment with hybridization strategies that use off-the-shelf deep learning language models to improve the capabilities of symbolic high-precision natural language systems, and (3) experiment with analogical learning in multimodal interpretation strategies. We will test these ideas using a variety of materials, including Navy training materials and existing corpora and datasets. Elementary school science concerns commonsense reasoning about the physical and biolog
Effective start/end date3/1/232/28/26


  • Office of Naval Research (N00014-23-1-2294)


Explore the research topics touched on by this project. These labels are generated based on the underlying awards/grants. Together they form a unique fingerprint.