Most computer vision systems are concernedwith computing the whats and wheres of a scene. We describe a set of programs concerned instead with computing the whys and hows — why the scene is the way it is, and how an agent can interact withit. The basis of our approach lies in the construction ofa causal explanation of a scene — a representation that describes what affects what in the scene, how these elements affect each other, and why they affect each other the way they do. Such explanations, by definition and design, must encompass representations of the potentials for action in ascene, and thus form a natural basis for describinghow scene elements serve purposes — i.e., functional descriptions. As a concrete case study in causal scene understanding, this paper focuses primarily on ways to exploit the causality of objects in static equilibrium, in particular, the causality of support. We describe three camera-to-commentary vision systems, operating in three different domains, that develop causal explanations of scenes from visual images of those scenes and, in the process, provide novel solutions to a number of traditional problems in visionand robotics, including occlusion, focus of attention, and grasp planning. We also show how the kinds of causal descriptions produced by these systems can be exploited to physically interact with the scene.
ASJC Scopus subject areas
- Signal Processing
- Computer Vision and Pattern Recognition