NDLI: Extraction, selection and ranking of Field Association (FA) Terms from domain-specific corpora for building a comprehensive FA terms dictionary

Content Provider	Springer Nature Link
Author	Dorji, Tshering Cigay Atlam, El sayed Yata, Susumu Fuketa, Masao Morita, Kazuhiro Aoe, Jun ichi
Copyright Year	2010
Abstract	Field Association (FA) Terms—words or phrases that serve to identify document fields are effective in document classification, similar file retrieval and passage retrieval. But the problem lies in the lack of an effective method to extract and select relevant FA Terms to build a comprehensive dictionary of FA Terms. This paper presents a new method to extract, select and rank FA Terms from domain-specific corpora using part-of-speech (POS) pattern rules, corpora comparison and modified tf-idf weighting. Experimental evaluation on 21 fields using 306 MB of domain-specific corpora obtained from English Wikipedia dumps selected up to 2,517 FA Terms (single and compound) per field at precision and recall of 74–97 and 65–98. This is better than the traditional methods. The FA Terms dictionary constructed using this method achieved an average accuracy of 97.6% in identifying the fields of 10,077 test documents collected from Wikipedia, Reuters RCV1 corpus and 20 Newsgroup data set.
Starting Page	141
Ending Page	161
Page Count	21
File Format	PDF
ISSN	02191377
Journal	Knowledge and Information Systems
Volume Number	27
Issue Number	1
e-ISSN	02193116
Language	English
Publisher	Springer-Verlag
Publisher Date	2010-04-24
Publisher Place	London
Access Restriction	One Nation One Subscription (ONOS)
Subject Keyword	Field Association (FA) Terms Terms weighting and selection Document classification Terminology extraction Information retrieval Information Systems and Communication Service Business Information Systems
Content Type	Text
Resource Type	Article
Subject	Artificial Intelligence Information Systems Human-Computer Interaction Hardware and Architecture Software

Sl.	Authority	Responsibilities	Communication Details
1	Ministry of Education (GoI), Department of Higher Education	Sanctioning Authority	https://www.education.gov.in/ict-initiatives
2	Indian Institute of Technology Kharagpur	Host Institute of the Project: The host institute of the project is responsible for providing infrastructure support and hosting the project	https://www.iitkgp.ac.in
3	National Digital Library of India Office, Indian Institute of Technology Kharagpur	The administrative and infrastructural headquarters of the project	Dr. B. Sutradhar bsutra@ndl.gov.in
4	Project PI / Joint PI	Principal Investigator and Joint Principal Investigators of the project	Dr. B. Sutradhar bsutra@ndl.gov.in Prof. Saswat Chakrabarti will be added soon
5	Website/Portal (Helpdesk)	Queries regarding NDLI and its services	support@ndl.gov.in
6	Contents and Copyright Issues	Queries related to content curation and copyright issues	content@ndl.gov.in
7	National Digital Library of India Club (NDLI Club)	Queries related to NDLI Club formation, support, user awareness program, seminar/symposium, collaboration, social media, promotion, and outreach	clubsupport@ndl.gov.in
8	Digital Preservation Centre (DPC)	Assistance with digitizing and archiving copyright-free printed books	dpc@ndl.gov.in
9	IDR Setup or Support	Queries related to establishment and support of Institutional Digital Repository (IDR) and IDR workshops	idr@ndl.gov.in

Automatic Building an Extensive Arabic FA Terms Dictionary

New approach for field association term dictionary with passage retrieval.

Parameterized Decay Model for Information Retrieval

Combining compound and single terms under language model framework

Automatic ranking of retrieval models using retrievability measure

Single pass text classification by direct feature weighting

An adaptive learning automata-based ranking function discovery algorithm

Pairwise ranking component analysis

Experiments with a component theory of probabilistic information retrieval based on single terms as document components

Extraction, selection and ranking of Field Association (FA) Terms from domain-specific corpora for building a comprehensive FA terms dictionary

Similar Documents

Automatic Building an Extensive Arabic FA Terms Dictionary

New approach for field association term dictionary with passage retrieval.

Parameterized Decay Model for Information Retrieval

Combining compound and single terms under language model framework

Automatic ranking of retrieval models using retrievability measure

Single pass text classification by direct feature weighting

An adaptive learning automata-based ranking function discovery algorithm

Pairwise ranking component analysis

Experiments with a component theory of probabilistic information retrieval based on single terms as document components

Extraction, selection and ranking of Field Association (FA) Terms from domain-specific corpora for building a comprehensive FA terms dictionary