Loading...
Please wait, while we are loading the content...
Similar Documents
Brill ’ s rule-based PoS tagger
| Content Provider | Semantic Scholar |
|---|---|
| Author | Megyesi, Beáta |
| Copyright Year | 2001 |
| Abstract | Eric Brill introduced a PoS tagger in 1992 that was based on rules, or transformations as he calls them, where the grammar is induced directly from the training corpus without human intervention or expert knowledge. The only additional component necessary is a small, manually and correctly annotated corpus the training corpus which serves as input to the tagger. The system is then able to derive lexical/morphological and contextual information from the training corpus and ‘ learns’ how to deduce the most likely part of speech tag for a word. Once the training is completed, the tagger can be used to annotate new, unannotated corpora based on the tagset of the training corpus. The rule-based part of speech tagger can be said to be a hybrid approach, because it first uses statistical techniques to extract information from the training corpus and then uses a program to automatically learn rules which reduces the faults that would be introduced by statistical mistakes (Brill, 1992). The tagger does not use hand-crafted rules or prespecified language information, nor does the tagger use external lexicons or lists of different types. According to Brill (1992) ‘ there is a very small amount of general linguistic knowledge built into the system, but no language-specific knowledge’ . The long time goal of the tagger is, in Brill’s own words, to ‘create a system which would enable somebody to take a large text in a language he does not know and with only a few hours of help from a speaker of the language accurately annotate the text with part of speech information.’ (Brill & Marcus, 1992b:1). For achieving his aim, Brill has also developed a parser, consisting of systems which may automatically derive word classes and the bracketing structure of sentences, assigning nonterminal labels to the bracketing structure and improving prepositional phrase attachment (see Brill, 1992, 1993a, 1993b, 1995b). In this work the part of speech tagger, i.e. learning the most likely tag for a word will be presented. The following section gives a description of the main ideas of Brill’s tagger and some information about how to train and test that tagger. All information contained in the following section is based on Brill’s articles, listed in the reference list (Brill, 1992; 1993a; 1993b; 1993c; 1994a; 1995a; 1995b). The description below is only functional, disregarding any efficiency aspects considered in the actual software. |
| File Format | PDF HTM / HTML |
| Alternate Webpage(s) | http://www.speech.kth.se/~bea/BrillsPosTager.ps |
| Alternate Webpage(s) | http://ccl.pku.edu.cn/doubtfire/NLP/Lexical_Analysis/Word_Segmentation_Tagging/Eric_Brill/Brill_Rule_Based_POS_Tager_by_Beata_Megyesi.pdf |
| Language | English |
| Access Restriction | Open |
| Content Type | Text |
| Resource Type | Article |