NDLI: Keyword Extraction from Arabic Documents using Term Equivalence Classes

Please wait, while we are loading the content...

Keyword Extraction from Arabic Documents using Term Equivalence Classes

Content Provider	ACM Digital Library
Author	Awajan, Arafat
Copyright Year	2015
Abstract	The rapid growth of the Internet and other computing facilities in recent years has resulted in the creation of a large amount of text in electronic form, which has increased the interest in and importance of different automatic text processing applications, including keyword extraction and term indexing. Although keywords are very useful for many applications, most documents available online are not provided with keywords. We describe a method for extracting keywords from Arabic documents. This method identifies the keywords by combining linguistics and statistical analysis of the text without using prior knowledge from its domain or information from any related corpus. The text is preprocessed to extract the main linguistic information, such as the roots and morphological patterns of derivative words. A cleaning phase is then applied to eliminate the meaningless words from the text. The most frequent terms are clustered into equivalence classes in which the derivative words generated from the same root and the non-derivative words generated from the same stem are placed together, and their count is accumulated. A vector space model is then used to capture the most frequent N-gram in the text. Experiments carried out using a real-world dataset show that the proposed method achieves good results with an average precision of 31% and average recall of 53% when tested against manually assigned keywords.
Starting Page	1
Ending Page	18
Page Count	18
File Format	PDF
ISSN	23754699
e-ISSN	23754702
DOI	10.1145/2665077
Volume Number	14
Issue Number	2
Journal	ACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP)
Language	English
Publisher	Association for Computing Machinery (ACM)
Publisher Date	2015-04-20
Publisher Place	New York
Access Restriction	One Nation One Subscription (ONOS)
Subject Keyword	Arabic natural language processing Keyword extraction Term equivalence classes Text analysis
Content Type	Text
Resource Type	Article
Subject	Computer Science

Sl.	Authority	Responsibilities	Communication Details
1	Ministry of Education (GoI), Department of Higher Education	Sanctioning Authority	https://www.education.gov.in/ict-initiatives
2	Indian Institute of Technology Kharagpur	Host Institute of the Project: The host institute of the project is responsible for providing infrastructure support and hosting the project	https://www.iitkgp.ac.in
3	National Digital Library of India Office, Indian Institute of Technology Kharagpur	The administrative and infrastructural headquarters of the project	Dr. B. Sutradhar bsutra@ndl.gov.in
4	Project PI / Joint PI	Principal Investigator and Joint Principal Investigators of the project	Dr. B. Sutradhar bsutra@ndl.gov.in Prof. Saswat Chakrabarti will be added soon
5	Website/Portal (Helpdesk)	Queries regarding NDLI and its services	support@ndl.gov.in
6	Contents and Copyright Issues	Queries related to content curation and copyright issues	content@ndl.gov.in
7	National Digital Library of India Club (NDLI Club)	Queries related to NDLI Club formation, support, user awareness program, seminar/symposium, collaboration, social media, promotion, and outreach	clubsupport@ndl.gov.in
8	Digital Preservation Centre (DPC)	Assistance with digitizing and archiving copyright-free printed books	dpc@ndl.gov.in
9	IDR Setup or Support	Queries related to establishment and support of Institutional Digital Repository (IDR) and IDR workshops	idr@ndl.gov.in

Spoken dialog summarization system with HAPPINESS/SUFFERING factor recognition

Arabic Natural Language Processing: Challenges and Solutions

RENAR: A Rule-Based Arabic Named Entity Recognition System

Arabic Text Categorization Based on Arabic Wikipedia

Introduction to the Special Issue on Arabic Natural Language Processing

Integrating natural language processing with image document analysis: what we learned from two real-world applications

Concept Relation Extraction from Construction Documents Using Natural Language Processing

Cross-Language Information Propagation for Arabic Mention Detection

Automatic Quality Control of Transportation Reports Using Statistical Language Processing

Keyword Extraction from Arabic Documents using Term Equivalence Classes

Similar Documents

Spoken dialog summarization system with HAPPINESS/SUFFERING factor recognition

Arabic Natural Language Processing: Challenges and Solutions

RENAR: A Rule-Based Arabic Named Entity Recognition System

Arabic Text Categorization Based on Arabic Wikipedia

Introduction to the Special Issue on Arabic Natural Language Processing

Integrating natural language processing with image document analysis: what we learned from two real-world applications

Concept Relation Extraction from Construction Documents Using Natural Language Processing

Cross-Language Information Propagation for Arabic Mention Detection

Automatic Quality Control of Transportation Reports Using Statistical Language Processing

Keyword Extraction from Arabic Documents using Term Equivalence Classes