Loading...
Please wait, while we are loading the content...
Similar Documents
Information Extraction from Structured Documents using k-testable Tree Automaton Inference
| Content Provider | CiteSeerX |
|---|---|
| Author | Kosala, Raymond Blockeel, Hendrik Bruynooghe, Maurice Bussche, Jan Van Den |
| Abstract | Information extraction (IE) addresses the problem of extracting specific information from a collection of documents. Much of the previous work on IE from struc- tured documents, such as HTML or XML, uses learning techniques that are based on strings, such as finite automata induction. This paper explores methods that exploit the tree structure of the documents. In particular, our method infers a k- testable tree automaton from a small set of annotated examples and explores various ways to generalize the inferred automaton. Experimental results on the benchmark data sets show that our approach compares favorably to the previous approaches. |
| File Format | |
| Access Restriction | Open |
| Subject Keyword | Various Way Inferred Automaton Structured Document Specific Information Finite Automaton Induction Annotated Example K-testable Tree Automaton Inference Small Set Testable Tree Automaton Benchmark Data Set Information Extraction Experimental Result Previous Approach Tree Structure |
| Content Type | Text |
| Resource Type | Article |