NDLI: A matrix density based algorithm to hierarchically co-cluster documents and words

Please wait, while we are loading the content...

Query-free news search

Model-theoretic semantics for the web

Dynamic service reconfiguration for wireless web access

Text joins in an RDBMS for web data integration

Efficient and robust streaming provisioning in VPNs

Web application security assessment by fault injection and behavior monitoring

SemTag and seeker: bootstrapping the semantic web via automated semantic annotation

DOM-based content extraction of HTML documents

Supporting management reporting: a writable web case study

Extrapolation methods for accelerating PageRank computations

SHOCK: communicating with computational messages and automatic private profiles

A system for principled matchmaking in an electronic marketplace

A new paradigm for ranking pages on the world wide web

Peer-to-peer architecture for content-based music retrieval on acoustic data

Conversation specification: a new approach to design and analysis of e-service composition

On deep annotation

Application specific data replication for edge services

Offering open hypermedia services to the WWW: a step-by-step approach for developers

A matrix density based algorithm to hierarchically co-cluster documents and words

Super-peer-based routing and clustering strategies for RDF-based peer-to-peer networks

On the bursty evolution of blogspace

Evaluating a new approach to strong web cache consistency with snapshots of collected content

An XPath-based preference language for P3P

Monitoring the dynamic web to respond to continuous queries

Ρ-Queries: enabling querying for semantic associations on the semantic web

A framework for coordinated multi-modal browsing with multiple clients

Improving pseudo-relevance feedback in web information retrieval using web page segmentation

Three theses of representation in the semantic web

Web browsing performance of wireless thin-client computing

Dynamic maintenance of web indexes using landmarks

On admission control for profit maximization of networked service providers

The HP time vault service: exploiting IBE for timed release of confidential information

Data extraction and label assignment for web databases

Fractal summarization for mobile devices to access large documents on the web

Scholarly publishing and argument in hyperspace

Scaling personalized web search

P2Cast: peer-to-peer patching scheme for VoD service

A software framework for matchmaking based on semantic web technology

Adaptive ranking of web pages

Towards a multimedia formatting vocabulary

Quality driven web services composition

An infrastructure for searching, reusing and evolving distributed ontologies

Evaluation of edge caching/offloading for dynamic content delivery

Xspect: bridging open hypermedia and XLink

Mining the peanut gallery: opinion extraction and semantic classification of product reviews

On labeling schemes for the semantic web

Make it fresh, make it quick: searching a network of personal webservers

Scalable techniques for memory-efficient CDN simulations

The Eigentrust algorithm for reputation management in P2P networks

A large-scale study of the evolution of web pages

Semantic search

A comparative web browser (CWB) for browsing and comparing web pages

Predictive caching and prefetching of query results in search engines

Description logic programs: combining logic programs with description logic

Sensor-enhanced mobile web clients: an XForms approach

High-performance spatial indexing for location-based services

Design, implementation, and evaluation of a client characterization driven web server

Content extraction signatures using XML digital signatures and custom transforms on-demand

The chatty web: emergent semantics through gossiping

Detecting web page structure for adaptive viewing on small form factor devices

Mining topic-specific concepts and definitions on the web

Adaptive on-line page importance computation

DEW: DNS-enhanced web for faster content delivery

SweetDeal: representing agent contracts with exceptions using XML rules, ontologies, and process descriptions

Searching the workplace web

Architecture of a quality based intelligent proxy (QBIX) for MPEG-4 videos

A foundation for tool based mobility support for visually impaired web users

Modeling redirection in geographically diverse server sets

The XML web: a first study

Mining newsgroups using networks arising from social behavior

Piazza: data management infrastructure for semantic web applications

Engineering and hosting adaptive freshness-sensitive web applications on data centers

Value-based web caching

Similarity measure and instance selection for collaborative filtering

Efficient URL caching for world wide web crawling

Agent-based semantic web services

Comparing link marker visualization techniques: changes in reading behavior

A matrix density based algorithm to hierarchically co-cluster documents and words

Content Provider	ACM Digital Library
Author	Kummamuru, Krishna Joshi, Sachindra Mandhani, Bhushan
Abstract	This paper proposes an algorithm to hierarchically cluster documents. Each cluster is actually a cluster of documents and an associated cluster of words, thus a document-word co-cluster. Note that, the vector model for documents creates the document-word matrix, of which every co-cluster is a submatrix. One would intuitively expect a submatrix made up of high values to be a good document cluster, with the corresponding word cluster containing its most distinctive features. Our algorithm looks to exploit this. We have defined matrix density, and our algorithm basically uses matrix density considerations in its working.The algorithm is a partitional-agglomerative algorithm. The partitioning step involves the identification of dense submatrices so that the respective row sets partition the row set of the complete matrix. The hierarchical agglomerative step involves merging the most "similar" submatrices until we are down to the required number of clusters (if we want a flat clustering) or until we have just the single complete matrix left (if we are interested in a hierarchical arrangement of documents). It also generates apt labels for each cluster or hierarchy node. The similarity measure between clusters that we use here for the merging cleverly uses the fact that the clusters here are co-clusters, and is a key point of difference from existing agglomerative algorithms. We will refer to the proposed algorithm as RPSA (Rowset Partitioning and Submatrix Agglomeration). We have compared it as a clustering algorithm with Spherical K-Means and Spectral Graph Partitioning. We have also evaluated some hierarchies generated by the algorithm.
Starting Page	511
Ending Page	518
Page Count	8
File Format	PDF
ISBN	1581136803
DOI	10.1145/775152.775225
Language	English
Publisher	Association for Computing Machinery (ACM)
Publisher Date	2003-05-20
Publisher Place	New York
Access Restriction	Subscribed
Content Type	Text
Resource Type	Article

Central Library (ISO-9001:2015 Certified)
Indian Institute of Technology Kharagpur
Kharagpur, West Bengal, India | PIN - 721302

See location in the Map
03222 282435
Mail: support@ndl.gov.in

Sl.	Authority	Responsibilities	Communication Details
1	Ministry of Education (GoI), Department of Higher Education	Sanctioning Authority	https://www.education.gov.in/ict-initiatives
2	Indian Institute of Technology Kharagpur	Host Institute of the Project: The host institute of the project is responsible for providing infrastructure support and hosting the project	https://www.iitkgp.ac.in
3	National Digital Library of India Office, Indian Institute of Technology Kharagpur	The administrative and infrastructural headquarters of the project	Dr. B. Sutradhar bsutra@ndl.gov.in
4	Project PI / Joint PI	Principal Investigator and Joint Principal Investigators of the project	Dr. B. Sutradhar bsutra@ndl.gov.in Prof. Saswat Chakrabarti will be added soon
5	Website/Portal (Helpdesk)	Queries regarding NDLI and its services	support@ndl.gov.in
6	Contents and Copyright Issues	Queries related to content curation and copyright issues	content@ndl.gov.in
7	National Digital Library of India Club (NDLI Club)	Queries related to NDLI Club formation, support, user awareness program, seminar/symposium, collaboration, social media, promotion, and outreach	clubsupport@ndl.gov.in
8	Digital Preservation Centre (DPC)	Assistance with digitizing and archiving copyright-free printed books	dpc@ndl.gov.in
9	IDR Setup or Support	Queries related to establishment and support of Institutional Digital Repository (IDR) and IDR workshops	idr@ndl.gov.in