## Abstract

Probability theory is mathematically the best understood paradigm for modeling and manipulating uncertain information. Probabilities of complex events can be computed from those of basic events on which they depend, using any of a number of strategies. Which strategy is appropriate depends very much on the known interdependencies among the events involved. Previous work on probabilistic databases has assumed a fixed and restrictive combination strategy (e.g., assuming all events are pairwise independent). In this article, we characterize, using postulates, whole classes of strategies for conjunction, disjunction, and negation, meaningful from the viewpoint of probability theory. (1) We propose a probabilistic relational data model and a generic probabilistic relational algebra that neatly captures various strategies satisfying the postulates, within a single unified framework. (2) We show that as long as the chosen strategies can be computed in polynomial time, queries in the positive fragment of the probabilistic relational algebra have essentially the same data complexity as classical relational algebra. (3) We establish various containments and equivalences between algebraic expressions, similar in spirit to those in classical algebra. (4) We develop algorithms for maintaining materialized probabilistic views. (5) Based on these ideas, we have developed a prototype probabilistic database system called ProbView on top of Dbase V.0. We validate our complexity results with experiments and show that rewriting certain types of queries to other equivalent forms often yields substantial savings.

Original language | English (US) |
---|---|

Pages (from-to) | 419-469 |

Number of pages | 51 |

Journal | ACM Transactions on Database Systems |

Volume | 22 |

Issue number | 3 |

DOIs | |

State | Published - Sep 1997 |

Externally published | Yes |

## Keywords

- Algebra
- Data complexity
- H.2.1 [Database Management]: Logical Design-data models
- H.2.3 [Database Management]: Languages-query languages
- H.2.4 [Database Management]: Systems
- Performance evaluation
- Probabilistic databases
- View maintenance

## ASJC Scopus subject areas

- Information Systems