Please wait, while we are loading the content...
Please wait, while we are loading the content...
| Content Provider | IEEE Xplore Digital Library |
|---|---|
| Author | Ghiasvand, O. Kate, R.J. |
| Copyright Year | 2015 |
| Description | Author affiliation: Univ. of Wisconsin-Milwaukee, Milwaukee, WI, USA (Ghiasvand, O.; Kate, R.J.) |
| Abstract | Summary form only given: Annotating clinical notes manually is very labor-intensive and needs expertise in the area of annotation. Thus annotation is a highly expensive task not only in human resource but also in financial aspects. Moreover mistakes, missed tags, and inconsistency are the common problems with manual annotations. The purpose of this research is to reduce humans as annotation effort for clinical notes, to improve consistency, and to decrease cost of annotation. The aim of this research is to annotate clinical texts to extract biomedical names and terms. In our research Unified Medical Language System (UMLS) is the reference meta thesaurus of names and terms used in biomedical and clinical domains. In this research we have done unsupervised and semi-supervised Named Entity Recognition (NER) through exact matching in UMLS. The data sets that have been used were provided by SemEval 2015 (task 14) natural language processing competition, including 199 clinical notes in training set and 133 notes in test set. The analysis that has been done so far can be divided into two steps: mapping and learning. The first step is to map all terms into UMLS that includes not only unigrams but also n-grams, usually n is 5. To achieve the best results of exact matching, we extracted UMLS terms of diseases and disorders based on semantic groups and mapped each n-gram to that part of UMLS. If there is a match, that is assumed to be a disease or disorder. When there is no match for n-grams (n>=2), to avoid low precisions, we supposed that unigrams must be noun phrases to be nominated as a disease/disorder. With this method we got 60% of f-score, and training files for next process (training CRFs) were generated. The second step involves using Conditional Random Fields (CRFs). The results generated in the first step were used to train the CRF. CRFs learn from training data the general contexts in which named entities occur. Also because of different levels of correctness in training files, we decided to modify training files before using them to train CRFs and to test on test data. Level of correctness means different accuracies of tagging in the data set. Because exact matching is not very accurate, the accuracy in different notes is variable. In some data it is very high and in some of them it is low. This results in an inconsistency in training files. To solve this problem we divided training files into ten groups. The CRF used only one group to be trained and to tag other groups, and results of exact matches and CRFs were combined (logic OR between results of CRF and exact match) together to get the final results. This was done for all other groups as well, and finally applied on test data. These two steps together are known as unsupervised disease named entity recognition, and the results show a difference of 10.3 percent between unsupervised and supervised approaches. By supervised learning we got 73% F-score while we got 62.7% by the proposed unsupervised approach. Another approach that was developed is semi supervised disease named entity recognition that used annotated files generated by unsupervised method and annotated files by human or gold standards. By this method we could improve 73% of F-score, that we got in supervised approach, to 74.2%. In the future some other refinements and extra tasks are going to be done. To improve the results, we are planning to use approximate matching by the process that is called normalization. Normalization means mapping a term in clinical notes to a preferred term in UMLS. These kinds of terms do not have exact matches, thus the way to find exact matches is to use normalization. Moreover we are going to do exact/approximate matching over discontinuous mentions in clinical texts. In these texts there are mentions including disconnected words in a sentence that together form a named entity. This essential step will extract those mentions that could not be extracted by exact match and normalization approaches. The last thing in our plan is to expand our developed system to a less supervised "Biomedical Named Entity Recognition (BNER)" to extract all biomedical and clinical terms. We will do this for other semantic groups in UMLS such as Activities and Behaviors, Anatomy, Devices, Phenomena, etc. Thus developing a less supervised annotating system for clinical notes could generate annotated notes with less cost of manual tagging, more consistent, and accurate enough. By using this approach it is feasible to extract tags of other semantic groups in UMLS, and finally it could be an advanced system to tag all the biomedical and clinical mentions based on semantic groups in UMLS. |
| Starting Page | 495 |
| Ending Page | 495 |
| File Size | 93950 |
| Page Count | 1 |
| File Format | |
| e-ISBN | 9781467395489 |
| DOI | 10.1109/ICHI.2015.85 |
| Language | English |
| Publisher | Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
| Publisher Date | 2015-10-21 |
| Publisher Place | USA |
| Access Restriction | Subscribed |
| Rights Holder | Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
| Subject Keyword | Exact matching Unified modeling language Manuals Conditional random fields Unsupervised learning Diseases Training Semantics Supervised learning Named entity recognition Machine learning Tagging UMLS Natural language processing |
| Content Type | Text |
| Resource Type | Article |
National Digital Library of India (NDLI) is a virtual repository of learning resources which is not just a repository with search/browse facilities but provides a host of services for the learner community. It is sponsored and mentored by Ministry of Education, Government of India, through its National Mission on Education through Information and Communication Technology (NMEICT). Filtered and federated searching is employed to facilitate focused searching so that learners can find the right resource with least effort and in minimum time. NDLI provides user group-specific services such as Examination Preparatory for School and College students and job aspirants. Services for Researchers and general learners are also provided. NDLI is designed to hold content of any language and provides interface support for 10 most widely used Indian languages. It is built to provide support for all academic levels including researchers and life-long learners, all disciplines, all popular forms of access devices and differently-abled learners. It is designed to enable people to learn and prepare from best practices from all over the world and to facilitate researchers to perform inter-linked exploration from multiple sources. It is developed, operated and maintained from Indian Institute of Technology Kharagpur.
Learn more about this project from here.
NDLI is a conglomeration of freely available or institutionally contributed or donated or publisher managed contents. Almost all these contents are hosted and accessed from respective sources. The responsibility for authenticity, relevance, completeness, accuracy, reliability and suitability of these contents rests with the respective organization and NDLI has no responsibility or liability for these. Every effort is made to keep the NDLI portal up and running smoothly unless there are some unavoidable technical issues.
Ministry of Education, through its National Mission on Education through Information and Communication Technology (NMEICT), has sponsored and funded the National Digital Library of India (NDLI) project.
| Sl. | Authority | Responsibilities | Communication Details |
|---|---|---|---|
| 1 | Ministry of Education (GoI), Department of Higher Education |
Sanctioning Authority | https://www.education.gov.in/ict-initiatives |
| 2 | Indian Institute of Technology Kharagpur | Host Institute of the Project: The host institute of the project is responsible for providing infrastructure support and hosting the project | https://www.iitkgp.ac.in |
| 3 | National Digital Library of India Office, Indian Institute of Technology Kharagpur | The administrative and infrastructural headquarters of the project | Dr. B. Sutradhar bsutra@ndl.gov.in |
| 4 | Project PI / Joint PI | Principal Investigator and Joint Principal Investigators of the project |
Dr. B. Sutradhar bsutra@ndl.gov.in Prof. Saswat Chakrabarti will be added soon |
| 5 | Website/Portal (Helpdesk) | Queries regarding NDLI and its services | support@ndl.gov.in |
| 6 | Contents and Copyright Issues | Queries related to content curation and copyright issues | content@ndl.gov.in |
| 7 | National Digital Library of India Club (NDLI Club) | Queries related to NDLI Club formation, support, user awareness program, seminar/symposium, collaboration, social media, promotion, and outreach | clubsupport@ndl.gov.in |
| 8 | Digital Preservation Centre (DPC) | Assistance with digitizing and archiving copyright-free printed books | dpc@ndl.gov.in |
| 9 | IDR Setup or Support | Queries related to establishment and support of Institutional Digital Repository (IDR) and IDR workshops | idr@ndl.gov.in |
|
Loading...
|