Loading...
Please wait, while we are loading the content...
Similar Documents
Information extraction from web pages based on k-testable tree automaton induction (extended abstract).
| Content Provider | CiteSeerX |
|---|---|
| Author | Blockeel, Hendrik Bussche, Jan Van Den Kosala, Raymond Bruynooghe, Maurice |
| Abstract | Information extraction refers to the process of extracting speci c pieces of information from a document; for instance, extracting from a text the names of the authors. Much of the work on information extraction from HTML or XML documents uses methods for processing strings, such as nite automata. However, as these documents have a tree structure, it seems natural to exploit this structure by using techniques that parse trees, not strings. Tree automata are a suitable device for this. In this paper we explore methods for automatically learning tree automata that can be used for extracting speci c information in tree-structured documents. We present an overview of several algorithms we have experimented with, as well as experimental results. |
| File Format | |
| Access Restriction | Open |
| Subject Keyword | Suitable Device Speci Piece Web Page Information Extraction Refers Extended Abstract Nite Automaton Xml Document Speci Information Information Extraction Experimental Result Tree-structured Document K-testable Tree Automaton Induction Tree Automaton Tree Structure |
| Content Type | Text |
| Resource Type | Article |