NDLI: Web Search Clustering and Labeling with Hidden Topics

Please wait, while we are loading the content...

ACM Transactions on Asian Language Information Processing (TALIP) : Volume 13

ACM Transactions on Asian Language Information Processing (TALIP) : Volume 12

ACM Transactions on Asian Language Information Processing (TALIP) : Volume 11

ACM Transactions on Asian Language Information Processing (TALIP) : Volume 10

ACM Transactions on Asian Language Information Processing (TALIP) : Volume 9

ACM Transactions on Asian Language Information Processing (TALIP) : Volume 8

Issue 4, December 2009

Issue 3, August 2009

Using Short Dependency Relations from Auto-Parsed Data for Chinese Dependency Parsing

Word-Wise Thai and Roman Script Identification

Web Search Clustering and Labeling with Hidden Topics

Issue 2, May 2009

Issue 1, March 2009

ACM Transactions on Asian Language Information Processing (TALIP) : Volume 7

ACM Transactions on Asian Language Information Processing (TALIP) : Volume 6

ACM Transactions on Asian Language Information Processing (TALIP) : Volume 5

ACM Transactions on Asian Language Information Processing (TALIP) : Volume 4

ACM Transactions on Asian Language Information Processing (TALIP) : Volume 3

ACM Transactions on Asian Language Information Processing (TALIP) : Volume 2

ACM Transactions on Asian Language Information Processing (TALIP) : Volume 1

Web Search Clustering and Labeling with Hidden Topics

Content Provider	ACM Digital Library
Author	Ha, Quang-Thuy Nguyen, Cam-Tu Phan, Xuan-Hieu Nguyen, Thu-Trang Horiguchi, Susumu
Copyright Year	2009
Abstract	Web search clustering is a solution to reorganize search results (also called “snippets”) in a more convenient way for browsing. There are three key requirements for such post-retrieval clustering systems: (1) the clustering algorithm should group similar documents together; (2) clusters should be labeled with descriptive phrases; and (3) the clustering system should provide high-quality clustering without downloading the whole Web page. This article introduces a novel framework for clustering Web search results in Vietnamese which targets the three above issues. The main motivation is that by enriching short snippets with hidden topics from huge resources of documents on the Internet, it is able to cluster and label such snippets effectively in a topic-oriented manner without concerning whole Web pages. Our approach is based on recent successful topic analysis models, such as Probabilistic-Latent Semantic Analysis, or Latent Dirichlet Allocation. The underlying idea of the framework is that we collect a very large external data collection called “universal dataset,” and then build a clustering system on both the original snippets and a rich set of hidden topics discovered from the universal data collection. This can be seen as a richer representation of snippets to be clustered. We carry out careful evaluation of our method and show that our method can yield impressive clustering quality.
Starting Page	1
Ending Page	40
Page Count	40
File Format	PDF
ISSN	15300226
e-ISSN	15583430
DOI	10.1145/1568292.1568295
Volume Number	8
Issue Number	3
Journal	ACM Transactions on Asian Language Information Processing (TALIP)
Language	English
Publisher	Association for Computing Machinery (ACM)
Publisher Date	2009-08-01
Publisher Place	New York
Access Restriction	One Nation One Subscription (ONOS)
Subject Keyword	Hierarchical Agglomerative Clustering Latent Dirichlet allocation Vietnamese Web search clustering Cluster labeling Collocation Hidden topics analysis
Content Type	Text
Resource Type	Article
Subject	Computer Science

Sl.	Authority	Responsibilities	Communication Details
1	Ministry of Education (GoI), Department of Higher Education	Sanctioning Authority	https://www.education.gov.in/ict-initiatives
2	Indian Institute of Technology Kharagpur	Host Institute of the Project: The host institute of the project is responsible for providing infrastructure support and hosting the project	https://www.iitkgp.ac.in
3	National Digital Library of India Office, Indian Institute of Technology Kharagpur	The administrative and infrastructural headquarters of the project	Dr. B. Sutradhar bsutra@ndl.gov.in
4	Project PI / Joint PI	Principal Investigator and Joint Principal Investigators of the project	Dr. B. Sutradhar bsutra@ndl.gov.in Prof. Saswat Chakrabarti will be added soon
5	Website/Portal (Helpdesk)	Queries regarding NDLI and its services	support@ndl.gov.in
6	Contents and Copyright Issues	Queries related to content curation and copyright issues	content@ndl.gov.in
7	National Digital Library of India Club (NDLI Club)	Queries related to NDLI Club formation, support, user awareness program, seminar/symposium, collaboration, social media, promotion, and outreach	clubsupport@ndl.gov.in
8	Digital Preservation Centre (DPC)	Assistance with digitizing and archiving copyright-free printed books	dpc@ndl.gov.in
9	IDR Setup or Support	Queries related to establishment and support of Institutional Digital Repository (IDR) and IDR workshops	idr@ndl.gov.in

Web search clustering and labeling with hidden topics.

Multimedia News Summarization in Search

Automatic Labeling of Topics

Discriminative sequential association latent dirichlet allocation for visual recognition

Parallel algorithms for merging topic trees and their application in meta search engines

Thread labeling for news event

An efficient approach to suggesting topically related web queries using hidden topic model

Agglomerative Hierarchical Clustering Algorithm- A Review

PLDA+: Parallel latent dirichlet allocation with data placement and pipeline processing

Web Search Clustering and Labeling with Hidden Topics

Similar Documents

Web search clustering and labeling with hidden topics.

Multimedia News Summarization in Search

Automatic Labeling of Topics

Discriminative sequential association latent dirichlet allocation for visual recognition

Parallel algorithms for merging topic trees and their application in meta search engines

Thread labeling for news event

An efficient approach to suggesting topically related web queries using hidden topic model

Agglomerative Hierarchical Clustering Algorithm- A Review

PLDA+: Parallel latent dirichlet allocation with data placement and pipeline processing

Web Search Clustering and Labeling with Hidden Topics