Loading...
Please wait, while we are loading the content...
Similar Documents
ABSTRACT Extracting Relations from Large Text Collections (2005)
| Content Provider | CiteSeerX |
|---|---|
| Researcher | Agichtein, Yevgeny Eugene Gravano, Luis |
| Abstract | A wealth of information is hidden within unstructured text. Often, this information can be best exploited in structured or relational form, which is well suited for sophisticated query processing, for integration with relational database management systems, and for data mining. This thesis addresses two fundamental problems in extracting relations from large text collections: (1) portability: tuning extraction systems for new domains and (2) scalability: scaling up information extraction to large collections of documents. To address the first problem, we developed the Snowball information extraction system, a domain-independent system that learns to extract relations from unstructured text based on only a handful of user-provided example relation instances. Snowball can then be adapted to extract new relations with minimum human effort. Snowball improves the extraction accu-racy by automatically evaluating the quality of both the acquired extraction patterns and the extracted relation instances. To address the second problem, we developed the QXtract system, which learns search engine queries that retrieve the documents that are relevant to a given information extraction system and extraction task. QXtract can dramatically improve the efficiency of the information extraction process, and provides a building block |
| File Format | |
| Publisher Date | 2005-01-01 |
| Access Restriction | Open |
| Subject Keyword | Large Text Collection Abstract Extracting Relation Unstructured Text Information Extraction Process Minimum Human Effort Search Engine Query Data Mining Qxtract System Extraction Pattern User-provided Example Relation Instance Relational Form Extraction Accu-racy Extraction Task Sophisticated Query Processing Second Problem Snowball Information Extraction System First Problem New Domain New Relation Tuning Extraction System Information Extraction Information Extraction System Extracted Relation Instance Large Collection Domain-independent System Fundamental Problem Relational Database Management System |
| Content Type | Text |
| Resource Type | Thesis |