Please wait, while we are loading the content...
Please wait, while we are loading the content...
| Content Provider | ACM Digital Library |
|---|---|
| Author | Koopman, Bevan |
| Abstract | In this thesis, we present models for semantic search: Information Retrieval (IR) models that elicit the meaning behind the words found in documents and queries rather than simply matching keywords. This is achieved by the integration of structured domain knowledge and data-driven information retrieval methods. The research is set within health informatics to tackle the unique challenges within this domain; specifically, how to bridge the 'semantic gap'; that is, how to overcome the mismatch between raw medical data and the way human beings interpret it. Bridging the semantic gap involves addressing two issues: semantics; that is, aligning the meaning or concepts behind words found in documents and queries; and leveraging inference, which utilises semantics to infer relevant information. Three semantic search models -- all utilising concept-based rather than term-based representations---are developed; these include: the Bag-of-concepts model, which utilises concepts from the SNOMED CT medical ontology as its underlying representation; the Graph-based Concept Weighting model, which captures concept dependence and importance in a novel weighting function; and the core contribution of the thesis, the Graph INference model (GIN): a unified theoretical model of semantic search as inference, achieved by the integration of structured domain knowledge (ontologies) and statistical, information retrieval methods. It is the GIN that provides the necessary mechanism for inference to bridge the semantic gap. All three models are empirically evaluated using clinical queries and a real-world collection of clinical records taken from the TREC Medical Records Track (MedTrack). Our evaluation shows that the use of concept-based representations in the Bag-of-concepts model leads to improved retrieval effectiveness. When concepts are combined within the Graph-based ConceptWeighting model, further improvements are possible. The evaluation of GIN highlighted that its inference mechanism is suited to hard queries -- those that perform poorly on a term-based system. In-depth analysis also revealed that the GIN returned many new documents not retrieved by term-based systems and therefore never evaluated for relevance as part of the TREC MedTrack. This highlights that using current IR test collections, where semantic search systems did not contribute to the pool, may underestimate the effectiveness of semantic search systems. This work represents a significant step forward in the integration of structured domain knowledge and data-driven information retrieval methods. Furthermore, the thesis provides an understanding of inference -- when and how it should be applied for effective semantic search. It shows that queries with certain characteristics benefit from inference, while others do not. The detailed investigation into the evaluation of semantic search systems shows how current IR test collections may underestimate effectiveness of such systems and new techniques for evaluation are suggested. The Graph Inference model, although developed within the medical domain, is generally defined and has implications in other areas, including web search, where an emerging research trend is to utilise structured knowledge resources for more effective semantic search. |
| Starting Page | 116 |
| Ending Page | 117 |
| Page Count | 2 |
| File Format | |
| ISSN | 01635840 |
| DOI | 10.1145/2701583.2701601 |
| Journal | ACM SIGIR Forum (SIGF) |
| Volume Number | 48 |
| Issue Number | 2 |
| Language | English |
| Publisher | Association for Computing Machinery (ACM) |
| Publisher Date | 1978-08-01 |
| Publisher Place | New York |
| Access Restriction | One Nation One Subscription (ONOS) |
| Content Type | Text |
| Resource Type | Article |
| Subject | Management Information Systems Hardware and Architecture |
National Digital Library of India (NDLI) is a virtual repository of learning resources which is not just a repository with search/browse facilities but provides a host of services for the learner community. It is sponsored and mentored by Ministry of Education, Government of India, through its National Mission on Education through Information and Communication Technology (NMEICT). Filtered and federated searching is employed to facilitate focused searching so that learners can find the right resource with least effort and in minimum time. NDLI provides user group-specific services such as Examination Preparatory for School and College students and job aspirants. Services for Researchers and general learners are also provided. NDLI is designed to hold content of any language and provides interface support for 10 most widely used Indian languages. It is built to provide support for all academic levels including researchers and life-long learners, all disciplines, all popular forms of access devices and differently-abled learners. It is designed to enable people to learn and prepare from best practices from all over the world and to facilitate researchers to perform inter-linked exploration from multiple sources. It is developed, operated and maintained from Indian Institute of Technology Kharagpur.
Learn more about this project from here.
NDLI is a conglomeration of freely available or institutionally contributed or donated or publisher managed contents. Almost all these contents are hosted and accessed from respective sources. The responsibility for authenticity, relevance, completeness, accuracy, reliability and suitability of these contents rests with the respective organization and NDLI has no responsibility or liability for these. Every effort is made to keep the NDLI portal up and running smoothly unless there are some unavoidable technical issues.
Ministry of Education, through its National Mission on Education through Information and Communication Technology (NMEICT), has sponsored and funded the National Digital Library of India (NDLI) project.
| Sl. | Authority | Responsibilities | Communication Details |
|---|---|---|---|
| 1 | Ministry of Education (GoI), Department of Higher Education |
Sanctioning Authority | https://www.education.gov.in/ict-initiatives |
| 2 | Indian Institute of Technology Kharagpur | Host Institute of the Project: The host institute of the project is responsible for providing infrastructure support and hosting the project | https://www.iitkgp.ac.in |
| 3 | National Digital Library of India Office, Indian Institute of Technology Kharagpur | The administrative and infrastructural headquarters of the project | Dr. B. Sutradhar bsutra@ndl.gov.in |
| 4 | Project PI / Joint PI | Principal Investigator and Joint Principal Investigators of the project |
Dr. B. Sutradhar bsutra@ndl.gov.in Prof. Saswat Chakrabarti will be added soon |
| 5 | Website/Portal (Helpdesk) | Queries regarding NDLI and its services | support@ndl.gov.in |
| 6 | Contents and Copyright Issues | Queries related to content curation and copyright issues | content@ndl.gov.in |
| 7 | National Digital Library of India Club (NDLI Club) | Queries related to NDLI Club formation, support, user awareness program, seminar/symposium, collaboration, social media, promotion, and outreach | clubsupport@ndl.gov.in |
| 8 | Digital Preservation Centre (DPC) | Assistance with digitizing and archiving copyright-free printed books | dpc@ndl.gov.in |
| 9 | IDR Setup or Support | Queries related to establishment and support of Institutional Digital Repository (IDR) and IDR workshops | idr@ndl.gov.in |
|
Loading...
|