Loading...
Please wait, while we are loading the content...
The Difficulties of Taxonomic Name Extraction and a Solution (2006)
| Content Provider | CiteSeerX |
|---|---|
| Author | Sautter, Guido Böhm, Klemens |
| Description | In modern biology, digitization of biosystematics publications is an important task. Extraction of taxonomic names from such documents is one of its major issues. This is because these names identify the various genera and species. This article reports on our experiences with learning techniques for this particular task. We say why established Named-Entity Recognition techniques are somewhat difficult to use in our context. One reason is that we have only very little training data available. Our experiments show that a combining approach that relies on regular expressions, heuristics, and word-level language recognition achieves very high precision and recall and allows to cope with those difficulties. 1 |
| File Format | |
| Language | English |
| Publisher Date | 2006-01-01 |
| Publisher Institution | in proceedings of BioNLP |
| Access Restriction | Open |
| Subject Keyword | Major Issue Modern Biology Taxonomic Name Extraction Little Training Data Biosystematics Publication Particular Task Word-level Language Recognition Regular Expression Taxonomic Name Various Genus High Precision Named-entity Recognition Technique Important Task |
| Content Type | Text |
| Resource Type | Article |