NDLI: Cross-language information retrieval using PARAFAC2

Please wait, while we are loading the content...

Efficient and effective explanation of change in hierarchical summaries

An event-based framework for characterizing the evolutionary behavior of interaction graphs

A framework for classification and segmentation of massive audio data streams

Data mining at the crossroads: successes, failures and learning from them

First International Workshop on Data Mining and Audience Intelligence for Advertising

Mining large time-evolving data using matrix and tensor tools

Calculating latent demand in the long tail

Estimating rates of rare events at multiple resolutions

On-board analysis of uncalibrated data for a spacecraft at mars

Detecting changes in large data sets of payment card data: a case study

KDD Cup and Workshop 2007

A statistical framework for mining data streams

From mining the web to inventing the new sciences underlying the internet

Predictive discrete latent factor models for large scale dyadic data

iLink: search and routing in social networks

Domain-constrained semi-supervised mining of tracking models in sensor networks

2007 International Workshop on Domain Driven Data Mining

Statistical modeling of relational data

Challenges in mining social network data: processes, privacy, and paradoxes

On string classification in data streams

Relational data pre-processing techniques for improved securities fraud detection

Event summarization for system management

First International Workshop on Knowledge Discovery from Sensor Data

Text mining and link analysis for web and semantic web

Xproj: a framework for projected structural clustering of xml documents

Cleaning disguised missing data: a heuristic approach

LungCAD: a clinically approved, machine learning system for lung cancer detection

The Joint Ninth WEBKDD and 1st SNA-KDD Workshop on Web Mining and Social Network Analysis

Learning Bayesian networks

Show me the money!: deriving the pricing power of product features by mining consumer reviews

Practical guide to controlled experiments on the web: listen to your customers not to the hippo

Machine learning for stock selection

First ACM SIGKDD International Workshop on Privacy, Security, and Trust in KDD

From trees to forests and rule sets: a unified overview of ensemble methods

Temporal causal modeling with graphical granger methods

Distributed classification in peer-to-peer networks

IMDS: intelligent malware detection system

Eighth International Workshop on Multimedia Data Mining

Mining shape and time series databases with symbolic representations

Extracting semantic relations from query logs

High-quantile modeling for customer wallet estimation and other applications

Truth discovery with multiple conflicting information providers on the web

Fifth International Workshop on Data Mining Standards, Services, and Platforms

Real-time ranking with concept drift using expert advice

Mining complex power networks for blackout prevention

Seventh International Workshop on Data Mining in Bioinformatics

Modeling relationships at multiple scales to improve accuracy of large recommender systems

Corroborate and learn facts from the web

First International Workshop on Mining Multiple Information Sources

Content-based document routing and index partitioning for scalable similarity-based searches in a large corpus

Extracting relevant named entities for automated expense reimbursement

First International Workshop and Challenge on Time Series Classification

Support feature machine for classification of abnormal brain activity

Second Workshop on Data Mining Case Studies and Practice Prize

Nonlinear adaptive distance metric learning for clustering

Density-based clustering for real-time stream data

Cross-language information retrieval using PARAFAC2

Evolutionary spectral clustering by incorporating temporal smoothness

Structural and temporal analysis of the blogosphere through community factorization

Discovering the hidden structure of house prices with a non-parametric latent manifold model

Stochastic processes and temporal data mining

Exploiting underrepresented query aspects for automatic query expansion

Canonicalization of database records using adaptive similarity measures

Co-clustering based classification for out-of-domain documents

Detecting anomalous records in categorical datasets

Feature selection methods for text classification

Efficient incremental constrained clustering

A framework for simultaneous co-clustering and learning from complex data

A learning framework using Green's function and kernel regularization with application to recommender system

Development of NeuroElectroMagnetic ontologies(NEMO): a framework for mining brainwave ontologies

Semi-supervised classification with hybrid generative/discriminative methods

Finding tribes: identifying close-knit individuals from employment patterns

Time-dependent event hierarchy construction

The minimum consistent subset cover problem and its applications in data mining

Constraint-driven clustering

Trajectory pattern mining

Enhanced max margin learning on multimodal data mining in a multimedia database

Finding low-entropy sets and trees from binary data

Dynamic hybrid clustering of bioinformatics by incorporating text mining and citation analysis

Detecting research topics via the correlation between graphs and texts

Exploiting duality in summarization with deterministic guarantees

Correlation search in graph databases

Raising the baseline for high-precision text classifiers

A fast algorithm for finding frequent episodes in event streams

Cost-effective outbreak detection in networks

Mining statistically important equivalence classes and delta-discriminative emerging patterns

Very sparse stable random projections for dimension reduction in lα (0 <α ≤ 2) norm

BoostCluster: boosting clustering by pairwise constraints

Efficient mining of iterative patterns for software specification discovery

A probabilistic framework for relational clustering

Nestedness and segmented nestedness

Automatic labeling of multinomial topic models

Expertise modeling for matching papers with reviewers

Joint cluster analysis of attribute and relationship data withouta-priori specification of the number of clusters

Multiscale topic tomography

Mining optimal decision trees from itemset lattices

Association analysis-based transformations for protein interaction networks: a function prediction case study

Applying collaborative filtering techniques to movie search for better ranking and browsing

Tracking multiple topics for finding interesting articles

Active exploration for learning rankings from clickthrough data

Hierarchical mixture models: a probabilistic analysis

Knowledge discovery of multiple-topic document using parametric mixture model with dirichlet prior

Using hierarchical clustering for learning theontologies used in recommendation systems

Practical learning from one-sided feedback

Information genealogy: uncovering the flow of ideas in non-hyperlinked document databases

A concept-based model for enhancing text categorization

Partial example acquisition in cost-sensitive learning

A spectral clustering approach to optimally combining numericalvectors with a modular network

Making generative classifiers robust to selection bias

Statistical change detection for multi-dimensional data

Use of ranked cross document evidence trails for hypothesis generation

GraphScope: parameter-free mining of large time-evolving graphs

Weighting versus pruning in rule validation for detecting network and host anomalies

Enhancing semi-supervised clustering: a feature projection perspective

A framework for community identification in dynamic social networks

A scalable modular convex solver for regularized risk minimization

Fast best-effort pattern matching in large attributed graphs

Fast direction-aware proximity for graph mining

Scalable look-ahead linear regression trees

Characterising the difference

Privacy-preservation for gradient descent methods

Mining correlated bursty topic patterns from coordinated text streams

Generalized component analysis for text with heterogeneous attributes

Mining favorable facets

Local decomposition for rare class analysis

SCAN: a structural clustering algorithm for networks

Model-shared subspace boosting for multi-label classification

Detecting time series motifs under uniform scaling

Learning the kernel matrix in discriminant analysis via quadratically constrained quadratic programming

From frequent itemsets to semantically meaningful visual patterns

Information distance from a question to an answer

Mining templates from search result records of search engines

Joint optimization of wrapper generation and template detection

Webpage understanding: an integrated approach

Cross-language information retrieval using PARAFAC2

Content Provider	ACM Digital Library
Author	Kolda, Tamara G. Bader, Brett W. Chew, Peter A. Abdelali, Ahmed
Abstract	A standard approach to cross-language information retrieval (CLIR) uses Latent Semantic Analysis (LSA) in conjunction with a multilingual parallel aligned corpus. This approach has been shown to be successful in identifying similar documents across languages - or more precisely, retrieving the most similar document in one language to a query in another language. However, the approach has severe drawbacks when applied to a related task, that of clustering documents "language-independently", so that documents about similar topics end up closest to one another in the semantic space regardless of their language. The problem is that documents are generally more similar to other documents in the same language than they are to documents in a different language, but on the same topic. As a result, when using multilingual LSA, documents will in practice cluster by language, not by topic. We propose a novel application of PARAFAC2 (which is a variant of PARAFAC, a multi-way generalization of the singular value decomposition [SVD]) to overcome this problem. Instead of forming a single multilingual term-by-document matrix which, under LSA, is subjected to SVD, we form an irregular three-way array, each slice of which is a separate term-by-document matrix for a single language in the parallel corpus. The goal is to compute an SVD for each language such that V (the matrix of right singular vectors) is the same across all languages. Effectively, PARAFAC2 imposes the constraint, not present in standard LSA, that the "concepts" in all documents in the parallel corpus are the same regardless of language. Intuitively, this constraint makes sense, since the whole purpose of using a parallel corpus is that exactly the same concepts are expressed in the translations. We tested this approach by comparing the performance of PARAFAC2 with standard LSA in solving a particular CLIR problem. From our results, we conclude that PARAFAC2 offers a very promising alternative to LSA not only for multilingual document clustering, but also for solving other problems in cross-language information retrieval.
Starting Page	143
Ending Page	152
Page Count	10
File Format	PDF QT / MOV
ISBN	9781595936097
DOI	10.1145/1281192.1281211
Language	English
Publisher	Association for Computing Machinery (ACM)
Publisher Date	2007-08-12
Publisher Place	New York
Access Restriction	Subscribed
Subject Keyword	Parafac2 Information retrieval Latent semantic analysis (lsa) Multilingual Clustering
Content Type	Video Text
Resource Type	Article

Central Library (ISO-9001:2015 Certified)
Indian Institute of Technology Kharagpur
Kharagpur, West Bengal, India | PIN - 721302

See location in the Map
03222 282435
Mail: support@ndl.gov.in

Sl.	Authority	Responsibilities	Communication Details
1	Ministry of Education (GoI), Department of Higher Education	Sanctioning Authority	https://www.education.gov.in/ict-initiatives
2	Indian Institute of Technology Kharagpur	Host Institute of the Project: The host institute of the project is responsible for providing infrastructure support and hosting the project	https://www.iitkgp.ac.in
3	National Digital Library of India Office, Indian Institute of Technology Kharagpur	The administrative and infrastructural headquarters of the project	Dr. B. Sutradhar bsutra@ndl.gov.in
4	Project PI / Joint PI	Principal Investigator and Joint Principal Investigators of the project	Dr. B. Sutradhar bsutra@ndl.gov.in Prof. Saswat Chakrabarti will be added soon
5	Website/Portal (Helpdesk)	Queries regarding NDLI and its services	support@ndl.gov.in
6	Contents and Copyright Issues	Queries related to content curation and copyright issues	content@ndl.gov.in
7	National Digital Library of India Club (NDLI Club)	Queries related to NDLI Club formation, support, user awareness program, seminar/symposium, collaboration, social media, promotion, and outreach	clubsupport@ndl.gov.in
8	Digital Preservation Centre (DPC)	Assistance with digitizing and archiving copyright-free printed books	dpc@ndl.gov.in
9	IDR Setup or Support	Queries related to establishment and support of Institutional Digital Repository (IDR) and IDR workshops	idr@ndl.gov.in