Loading...
Please wait, while we are loading the content...
Similar Documents
Shallow Semantic Annotation of Biomedical Corpora for Information Extraction
| Content Provider | Semantic Scholar |
|---|---|
| Author | Kulick, Seth Liberman, Mark Palmer, Martha Schein, Andrew I. |
| Copyright Year | 2003 |
| Abstract | Work over the last few years in literature data mining for biology has progressed from linguistically unsophisticated models to the adaptation of Natural Language Processing (NLP) techniques that use full parsers ([11, 16]) and coreference to extract relations that span multiple sentences ([12, 6]) (For an overview, see [7]). However, there has been a lack of annotated corpora that can fuel further work in this direction in the same way that the development of syntactically annotated corpora such as the Penn Treebank ([10]) led to the development of statistical language parsers (e.g., [3]). To address this situation, we 1 are developing new linguistic resources in three categories: a large corpus of biomedical text annotated with syntactic structure (Treebank) and predicate-argument structure (”proposition bank” or Propbank); a large set of biomedical abstracts and full-text articles annotated with entities and relations of interest to researchers, such as enzyme inhibition, or mutation/cancer connections (Factbanks); and broad-coverage lexicons and tools for the analysis of biomedical texts. We are also developing and adapting software tools that allow human experts to annotate biomedical texts for entity tagging, as well as for treebanking and propbanking. We are focusing initially on two applications: drug development, in collaboration with researchers in the Knowledge Integration and Discovery Systems group at GlaxoSmithKline, and pediatric oncology, in collaboration with researchers in the eGenome group at Children’s Hospital of Philadelphia. These applications, worthwhile in their own right, provide excellent test beds for broader research efforts in natural language processing and data integration. A guiding principle for this project is the annotation of a corpus with different levels of shallow semantics that will permit the development of NLP tools to extract the desired entities and relationships. These levels consist of entity tagging, reference and coreference, propbanking, and factbanking. Key to the approach is the integration of the different levels of semantic and syntactic annotation with an eye towards clear conceptual semantics, feasibility of implementation, and likelihood of practical benefit. This is a novel approach from the point-of-view of NLP since previous efforts at treebanking and propbanking have been independent of the special status of any entities, and previous efforts at entity annotation have been independent of corresponding layers of syntactic and semantic structure. |
| File Format | PDF HTM / HTML |
| Alternate Webpage(s) | http://www.andrewschein.com/publications/abs-revised2.pdf |
| Alternate Webpage(s) | http://www.cis.upenn.edu/~ais/publications/abs-revised2.pdf |
| Alternate Webpage(s) | http://www.cis.upenn.edu/~ais/publications/abs-revised2.ps.gz |
| Language | English |
| Access Restriction | Open |
| Content Type | Text |
| Resource Type | Article |