NDLI: Issues in learning an ontology from text

Please wait, while we are loading the content...

Issues in learning an ontology from text

Content Provider	Springer Nature : BioMed Central
Author	Brewster, Christopher Jupp, Simon Luciano, Joanne Shotton, David Stevens, Robert D Zhang, Ziqi
Abstract	Ontology construction for any domain is a labour intensive and complex process. Any methodology that can reduce the cost and increase efficiency has the potential to make a major impact in the life sciences. This paper describes an experiment in ontology construction from text for the animal behaviour domain. Our objective was to see how much could be done in a simple and relatively rapid manner using a corpus of journal papers. We used a sequence of pre-existing text processing steps, and here describe the different choices made to clean the input, to derive a set of terms and to structure those terms in a number of hierarchies. We describe some of the challenges, especially that of focusing the ontology appropriately given a starting point of a heterogeneous corpus. Results Using mainly automated techniques, we were able to construct an 18055 term ontology-like structure with 73% recall of animal behaviour terms, but a precision of only 26%. We were able to clean unwanted terms from the nascent ontology using lexico-syntactic patterns that tested the validity of term inclusion within the ontology. We used the same technique to test for subsumption relationships between the remaining terms to add structure to the initially broad and shallow structure we generated. All outputs are available at http://thirlmere.aston.ac.uk/~kiffer/animalbehaviour/ . Conclusion We present a systematic method for the initial steps of ontology or structured vocabulary construction for scientific domains that requires limited human effort and can make a contribution both to ontology learning and maintenance. The method is useful both for the exploration of a scientific domain and as a stepping stone towards formally rigourous ontologies. The filtering of recognised terms from a heterogeneous corpus to focus upon those that are the topic of the ontology is identified to be one of the main challenges for research in ontology learning.
Related Links	https://bmcbioinformatics.biomedcentral.com/counter/pdf/10.1186/1471-2105-10-S5-S1.pdf
Ending Page	20
Page Count	20
Starting Page	1
File Format	HTM / HTML
ISSN	14712105
DOI	10.1186/1471-2105-10-S5-S1
Journal	BMC Bioinformatics
Issue Number	5
Volume Number	10
Language	English
Publisher	BioMed Central
Publisher Date	2009-05-06
Access Restriction	Open
Subject Keyword	Bioinformatics Microarrays Computational Biology Computer Appl. in Life Sciences Algorithms Animal Behaviour Regular Expression Formal Ontology Ontology Learning Ontology Module Computational Biology/Bioinformatics
Content Type	Text
Resource Type	Article
Subject	Molecular Biology Biochemistry Computer Science Applications Applied Mathematics Structural Biology
Journal Impact Factor	2.9/2023
5-Year Journal Impact Factor	3.6/2023

Sl.	Authority	Responsibilities	Communication Details
1	Ministry of Education (GoI), Department of Higher Education	Sanctioning Authority	https://www.education.gov.in/ict-initiatives
2	Indian Institute of Technology Kharagpur	Host Institute of the Project: The host institute of the project is responsible for providing infrastructure support and hosting the project	https://www.iitkgp.ac.in
3	National Digital Library of India Office, Indian Institute of Technology Kharagpur	The administrative and infrastructural headquarters of the project	Dr. B. Sutradhar bsutra@ndl.gov.in
4	Project PI / Joint PI	Principal Investigator and Joint Principal Investigators of the project	Dr. B. Sutradhar bsutra@ndl.gov.in Prof. Saswat Chakrabarti will be added soon
5	Website/Portal (Helpdesk)	Queries regarding NDLI and its services	support@ndl.gov.in
6	Contents and Copyright Issues	Queries related to content curation and copyright issues	content@ndl.gov.in
7	National Digital Library of India Club (NDLI Club)	Queries related to NDLI Club formation, support, user awareness program, seminar/symposium, collaboration, social media, promotion, and outreach	clubsupport@ndl.gov.in
8	Digital Preservation Centre (DPC)	Assistance with digitizing and archiving copyright-free printed books	dpc@ndl.gov.in
9	IDR Setup or Support	Queries related to establishment and support of Institutional Digital Repository (IDR) and IDR workshops	idr@ndl.gov.in

A hub-attachment based method to detect functional modules from confidence-scored protein interactions and expression profiles

Saliva Ontology: An ontology-based framework for a Salivaomics Knowledge Base

EXACT2: the semantics of biomedical protocols

SALON ontology for the formal description of sequence alignments

Semi-supervised learning for the identification of syn-expressed genes from fused microarray and in situ image data

MINE: Module Identification in Networks

Structuring an event ontology for disease outbreak detection

Validating module network learning algorithms using simulated data

Seq-ing improved gene expression estimates from microarrays using machine learning

Issues in learning an ontology from text

Similar Documents

A hub-attachment based method to detect functional modules from confidence-scored protein interactions and expression profiles

Saliva Ontology: An ontology-based framework for a Salivaomics Knowledge Base

EXACT2: the semantics of biomedical protocols

SALON ontology for the formal description of sequence alignments

Semi-supervised learning for the identification of syn-expressed genes from fused microarray and in situ image data

MINE: Module Identification in Networks

Structuring an event ontology for disease outbreak detection

Validating module network learning algorithms using simulated data

Seq-ing improved gene expression estimates from microarrays using machine learning

Issues in learning an ontology from text