Loading...
Please wait, while we are loading the content...
Similar Documents
Information Extraction in Structured Documents using Tree Automata Induction (2002)
| Content Provider | CiteSeerX |
|---|---|
| Author | Kosala, Raymond Blockeel, Hendrik Bruynooghe, Maurice Bussche, Jan Van Den Leuven, B.- Celestijnenlaan, A. |
| Abstract | ument, a context that is lost if the document is linearized into a string. Our work reports on the use of k-testable tree languages, a kind of tree automata formalism, for the extraction of information from structured documents. The highlights of our work are: We motivated and investigated the application of the more expressive tree automata inference method for IE from structured documents. This approach has several advantages compared to the string-based and the other methods as follows. Firstly, some IE systems preprocess documents to split them up in small fragments and only use a part of the document as training example. This is not needed here as the tree structure that we get for free takes care of this. Thus the entire document tree can be used as training example. Secondly, our method does not require the manual speci cation of the window's length for the pre x, sux and target fragments, and the special tokens or landmarks such as \>" or \;", that are usually required fo |
| File Format | |
| Publisher Date | 2002-01-01 |
| Access Restriction | Open |
| Subject Keyword | Entire Document Tree Ie System Preprocess Document Manual Speci Cation Work Report Structured Document Tree Automaton Induction K-testable Tree Language Small Fragment Expressive Tree Automaton Inference Method Tree Automaton Formalism Training Example Several Advantage Information Extraction Target Fragment Special Token Tree Structure |
| Content Type | Text |