Our goal is to design a knowledge discovery tool that has the ability to accurately generate rules using concepts and structured data values extracted from semi-structured documents. To date, two of our major contributions have been the design of a system architecture that facilitates the discovery of rules from HTML documents and the development of an efficient association rule algorithm that generates rule sets based on user specified constraints. This paper discusses each of these contributions within the framework of our prototype system IRIS. IRIS allows users to specify a set of constraints associated with a particular domain and then generates association rules based on these constraints. One of the unique features of IRIS is that it generates rules using the more structured component of the HTML documents, as well as the conceptual knowledge extracted from the unstructured blocks of text.
|Original language||English (US)|
|Number of pages||8|
|Journal||Proceedings of SPIE - The International Society for Optical Engineering|
|State||Published - Jan 1 1999|
ASJC Scopus subject areas
- Electrical and Electronic Engineering
- Condensed Matter Physics