Loading...
Please wait, while we are loading the content...
Similar Documents
A simple and robust method for extracting terminology
| Content Provider | Semantic Scholar |
|---|---|
| Author | Sarmento, Luís |
| Copyright Year | 2009 |
| Abstract | In this paper we will present a simple, yet effective, method for extracting terminology from technical text. The method is based on the observation that for technical domains it is much simpler to describe what a valid terminological unit cannot be than what it can possibly be. Our method relies on a set of filters that exclude multi-word units according to simple rules regarding their context and internal lexical structure, and it does not require any special pre-processing such as POS tagging. Rules were hand-coded in a simple incremental process and may be ported to several languages with little effort. Additionally, the method is able to process more than two million words per minute on a standard computer. Although the method was originally intended for semiautomatic terminological extraction, we believe that it can also be applied in fully automated procedures, making it appropriate for large-scale information extraction. We will start by explaining our main motivation for building this method and we will describe its role in a larger framework, the Corpografo. We will then present the process of building the current method, from the first very simple approaches to the current version, pointing out the problems encountered at each step. We will then present results of applying the current version of the extraction method to specific domain corpora in English. Finally, we will present future plans and explain how we are currently in the process of building a small semantic lexicon for helping future large-scale information extraction procedures. |
| File Format | PDF HTM / HTML |
| DOI | 10.7202/019924ar |
| Volume Number | 50 |
| Alternate Webpage(s) | https://www.erudit.org/fr/revues/meta/2005-v50-n4-meta1024/019924ar.pdf |
| Alternate Webpage(s) | http://poloclup.linguateca.pt/docs/meta/META-Sarmento.pdf |
| Alternate Webpage(s) | http://retro.erudit.org/livre/meta/2005/000263co.pdf |
| Alternate Webpage(s) | https://doi.org/10.7202/019924ar |
| Language | English |
| Access Restriction | Open |
| Content Type | Text |
| Resource Type | Article |