NDLI: Phrase-based document categorization revisited

Please wait, while we are loading the content...

A case for probabilistic logic for scalable patent retrieval

Ontologies and semantic mining for bio-technology and chemistry data and patents

Extracting problem solved concepts from patent documents

Identification of low/high retrievable patents using content-based features

Phrase-based document categorization revisited

Interactive constrained clustering for patent document set

A design rationale representation model using patent documents

Automatic translation of scholarly terms into patent terms

Using normalized alignment scores to detect incorrectly aligned segments

On the role of classification in patent invalidity searches

Patent claim decomposition for improved information extraction

FindCite: automatically finding prior art patents

Phrase-based document categorization revisited

Content Provider	ACM Digital Library
Author	Beney, Jean G. Koster, Cornelis H.A.
Abstract	This paper takes a fresh look at an old idea in Information Retrieval: the use of linguistically extracted phrases as terms in the automatic categorization (aka classification) of documents. Until now, there was found little or no evidence that document categorization benefits from the application of linguistics techniques. Classification algorithms using the most cleverly designed linguistical representations typically do no better than those using simply the bag-of-words representation. Shallow linguistical techniques are used routinely, but their positive effect on the accuracy is small at best. We have investigated the use of dependency triples as terms in document categorization, which are derived according to a dependency model based on the notion of aboutness. The documents are syntactically analyzed by a parser and transduced to dependency trees, which in turn are unnested into dependency triples following the aboutness-based model. In the process, various normalizing transformations are applied to enhance recall. We describe a sequence of large-scale experiments with different document representations, test collections and even languages, presenting evidence that adding such triples to the words in a bag-of-terms document representation may lead to a significant increase in the accuracy of document categorization.
Starting Page	49
Ending Page	56
Page Count	8
File Format	PDF
ISBN	9781605588094
DOI	10.1145/1651343.1651357
Language	English
Publisher	Association for Computing Machinery (ACM)
Publisher Date	2009-11-06
Publisher Place	New York
Access Restriction	Subscribed
Subject Keyword	Aboutness Text categorization Dependency triples Linguistic terms
Content Type	Text
Resource Type	Article

Central Library (ISO-9001:2015 Certified)
Indian Institute of Technology Kharagpur
Kharagpur, West Bengal, India | PIN - 721302

See location in the Map
03222 282435
Mail: support@ndl.gov.in

Sl.	Authority	Responsibilities	Communication Details
1	Ministry of Education (GoI), Department of Higher Education	Sanctioning Authority	https://www.education.gov.in/ict-initiatives
2	Indian Institute of Technology Kharagpur	Host Institute of the Project: The host institute of the project is responsible for providing infrastructure support and hosting the project	https://www.iitkgp.ac.in
3	National Digital Library of India Office, Indian Institute of Technology Kharagpur	The administrative and infrastructural headquarters of the project	Dr. B. Sutradhar bsutra@ndl.gov.in
4	Project PI / Joint PI	Principal Investigator and Joint Principal Investigators of the project	Dr. B. Sutradhar bsutra@ndl.gov.in Prof. Saswat Chakrabarti will be added soon
5	Website/Portal (Helpdesk)	Queries regarding NDLI and its services	support@ndl.gov.in
6	Contents and Copyright Issues	Queries related to content curation and copyright issues	content@ndl.gov.in
7	National Digital Library of India Club (NDLI Club)	Queries related to NDLI Club formation, support, user awareness program, seminar/symposium, collaboration, social media, promotion, and outreach	clubsupport@ndl.gov.in
8	Digital Preservation Centre (DPC)	Assistance with digitizing and archiving copyright-free printed books	dpc@ndl.gov.in
9	IDR Setup or Support	Queries related to establishment and support of Institutional Digital Repository (IDR) and IDR workshops	idr@ndl.gov.in