Document Interrogation   Soraya Abad-Mota

120 страниц. 2010 год.
LAP Lambert Academic Publishing
Organizations produce large numbers of documents. Often the contents of these documents do not reach the operational databases or data warehouses of the enterprise. With the world-wide accessibility to the web these documents are made available to a wide audience, but browsing through them manually is cumbersome, at best. The semantic web concept has led to fascinating possibilities in trying to make explicit the semantics of terabytes of unstructured data available today. In this book we define the Document Interrogation Architecture (DIA) to extract data from the documents using information extraction techniques and to populate a database with the extracted data. The domain of the documents is represented with an ontology, which is the basis for the definition of an interrogation language with approximate query processing capabilities. With DIA many organizations could take advantage of the contents of their documents. Therefore this book should be particularly useful for computer...
