NDLI: Finding structure in noisy text: topic classification and unsupervised clustering

Content Provider	Springer Nature Link
Author	Natarajan, Prem Prasad, Rohit Subramanian, Krishna Saleem, Shirin Choi, Fred Schwartz, Rich
Copyright Year	2007
Abstract	This paper addresses two types of classification of noisy, unstructured text such as newsgroup messages: (1) spotting messages containing topics of interest, and (2) automatic conceptual organization of messages without prior knowledge of topics of interest. In addition to applying our hidden Markov model methodology to spotting topics of interest in newsgroup messages, we present a robust methodology for rejecting messages which are off-topic. We describe a novel approach for automatically organizing a large, unstructured collection of messages. The approach applies an unsupervised topic clustering procedure to generate a hierarchical tree of topics.
Starting Page	187
Ending Page	198
Page Count	12
File Format	PDF
ISSN	14332833
Journal	International Journal of Document Analysis and Recognition (IJDAR)
Volume Number	10
Issue Number	3-4
e-ISSN	14332825
Language	English
Publisher	Springer-Verlag
Publisher Date	2007-12-05
Publisher Place	Berlin, Heidelberg
Access Restriction	One Nation One Subscription (ONOS)
Subject Keyword	Topic classification Unsupervised topic discovery Clustering Hidden markov model Pattern Recognition Image Processing and Computer Vision
Content Type	Text
Resource Type	Article
Subject	Computer Science Applications Computer Vision and Pattern Recognition Software

Sl.	Authority	Responsibilities	Communication Details
1	Ministry of Education (GoI), Department of Higher Education	Sanctioning Authority	https://www.education.gov.in/ict-initiatives
2	Indian Institute of Technology Kharagpur	Host Institute of the Project: The host institute of the project is responsible for providing infrastructure support and hosting the project	https://www.iitkgp.ac.in
3	National Digital Library of India Office, Indian Institute of Technology Kharagpur	The administrative and infrastructural headquarters of the project	Dr. B. Sutradhar bsutra@ndl.gov.in
4	Project PI / Joint PI	Principal Investigator and Joint Principal Investigators of the project	Dr. B. Sutradhar bsutra@ndl.gov.in Prof. Saswat Chakrabarti will be added soon
5	Website/Portal (Helpdesk)	Queries regarding NDLI and its services	support@ndl.gov.in
6	Contents and Copyright Issues	Queries related to content curation and copyright issues	content@ndl.gov.in
7	National Digital Library of India Club (NDLI Club)	Queries related to NDLI Club formation, support, user awareness program, seminar/symposium, collaboration, social media, promotion, and outreach	clubsupport@ndl.gov.in
8	Digital Preservation Centre (DPC)	Assistance with digitizing and archiving copyright-free printed books	dpc@ndl.gov.in
9	IDR Setup or Support	Queries related to establishment and support of Institutional Digital Repository (IDR) and IDR workshops	idr@ndl.gov.in

Finding structure in noisy text: topic classification and unsupervised clustering