NDLI: Mining topics in documents: standing on the shoulders of big data

Please wait, while we are loading the content...

The battle for the future of data mining

Prediction of human emergency behavior and their mobility following large-scale disaster

LUDIA: an aggregate-constrained low-rank reconstruction algorithm to leverage publicly released health data

FUNNEL: automatic mining of spatially coevolving epidemics

COM: a generative model for group recommendation

Relevant overlapping subspace clusters on categorical data

FastXML: a fast, accurate and stable tree-classifier for extreme multi-label learning

Large margin distribution machine

Sleep analytics and online selective anomaly detection

Prototype-based learning on concept-drifting data streams

Active-transductive learning with label-adapted kernels

Effective global approaches for mutual information based feature selection

Parallel gibbs sampling for hierarchical dirichlet processes via gamma processes equivalence

Improving the modified nyström method using spectral shifting

Efficient mini-batch training for stochastic optimization

Open-domain quantity queries on web tables: annotation, response, and consensus models

Efficient multi-task feature learning with calibration

Optimal recommendations under attraction, aversion, and social influence

TCS: efficient topic discovery over crowd-oriented service data

Differentially private network data release via structural inference

Fast DTT: a near linear algorithm for decomposing a tensor into factor tensors

Grouping students in educational settings

From labor to trader: opinion elicitation via online crowds as a market

Mining topics in documents: standing on the shoulders of big data

Non-parametric scan statistics for event detection and forecasting in heterogeneous social media graphs

A bayesian framework for estimating properties of network diffusions

Who to follow and why: link prediction with explanations

Core decomposition of uncertain graphs

Community membership identification from small seed sets

Almost linear-time algorithms for adaptive betweenness centrality using hypergraph sketches

Using strong triadic closure to characterize ties in social networks

Frontiers in E-commerce personalization

Guilt by association: large scale malware detection by mining file-relation graphs

Does social good justify risking personal privacy?

Scaling up deep learning

Data, predictions, and decisions in support of people and society

Inferring user demographics and social strategies in mobile social networks

People on drugs: credibility of user statements in health communities

Marble: high-throughput phenotyping from electronic health records via sparse nonnegative tensor factorization

Leveraging user libraries to bootstrap collaborative filtering

Batch discovery of recurring rare classes toward identifying anomalous samples

A multi-class boosting method with direct optimization

Distance metric learning using dropout: a structured regularization approach

GLAD: group anomaly detection in social media analysis

Detecting moving object outliers in massive-scale trajectory streams

Active learning for sparse bayesian multilabel classification

Gradient boosted feature selection

Empirical glitch explanations

Fast flux discriminant for large-scale sparse nonlinear classification

Streaming submodular maximization: massive data summarization on the fly

Crowdsourced time-sync video tagging using temporal and personalized topic modeling

Multi-task copula by sparse graph regression

ClusCite: effective citation recommendation by information network-based clustering

SigniTrend: scalable detection of emerging topics in textual streams by hashed significance thresholds

Exponential random graph estimation under differential privacy

Clustering and projected clustering with adaptive neighbors

Inferring gas consumption and pollution emission of vehicles throughout a city

Optimal real-time bidding for display advertising

Integrating spreadsheet data via accurate and low-effort extraction

Event detection in activity networks

Scalable diffusion-aware optimization of network topology

Activity-edge centric multi-label classification for mining heterogeneous information networks

Learning multifractal structure in large networks

Community detection in graphs through correlation

FAST-PPR: scaling personalized pagerank estimation for large graphs

Analyzing expert behaviors in collaborative networks

Predictive modeling in practice: a case study from sprint

Mining text snippets for images on the web

Constructing and mining web-scale knowledge graphs: KDD 2014 tutorial

A data driven approach to diagnosing and treating disease

Travel time estimation of a path using sparse trajectories

Unfolding physiological state: mortality modelling in intensive care units

Scalable noise mining in long-term electrocardiographic time-series to predict death following heart attacks

Topic-factorized ideal point estimation model for legislative voting network

A dirichlet multinomial mixture model-based approach for short text clustering

An efficient algorithm for weak hierarchical lasso

Box drawings for learning with imbalanced data

FBLG: a simple and effective approach for temporal dependence discovery from time series data

The setwise stream classification problem

Large-scale adaptive semi-supervised learning via unified inductive and transductive model

Simultaneous feature and feature group selection through hard thresholding

Learning with dual heterogeneity: a nonparametric bayes model

Scalable histograms on large probabilistic data

Distance queries from sampled data: accurate and efficient

Identifying and labeling search tasks via query-based hawkes processes

Unifying learning to rank and domain adaptation: enabling cross-task document scoring

GeoMF: joint geographical modeling and matrix factorization for point-of-interest recommendation

Experiments with non-parametric topic models

Top-k frequent itemsets via differentially private FP-trees

LWI-SVD: low-rank, windowed, incremental singular value decompositions on time-evolving data sets

Methods for ordinal peer grading

Quantifying herding effects in crowd wisdom

Sentiment expression conditioned by affective transitions and social forces

FEMA: flexible evolutionary multi-faceted analysis for dynamic behavioral pattern discovery

Probabilistic latent network visualization: inferring and embedding diffusion networks

Meta-path based multi-network collective link prediction

Temporal skeletonization on sequential data: patterns, categorization, and visualization

Heat kernel based community detection

Graph sample and hold: a framework for big-graph analytics

Predicting long-term impact of CQA posts: a comprehensive viewpoint

Medicine in the age of electronic health records

Predicting student risks through longitudinal analysis

Bringing structure to text: mining phrases, entities, topics, and hierarchies

Bugbears or legitimate threats?: (social) scientists' criticisms of machine learning?

Modeling human location data with mixtures of kernel densities

Unsupervised learning of disease progression models

From micro to macro: data driven phenotyping by densification of longitudinal electronic medical records

Jointly modeling aspects, ratings and sentiments for movie recommendation (JMARS)

Representative clustering of uncertain data

Online multiple kernel regression

Incremental and decremental training for linear classification

Learning time-series shapelets

Streamed approximate counting of distinct elements: beating optimal batch methods

Active semi-supervised learning using sampling theory for graph signals

Safe and efficient screening for sparse support vector machine

Online chinese restaurant process

Correlation clustering in MapReduce

Improved testing of low rank matrices

LaSEWeb: automating search strategies over semi-structured web data

Scalable heterogeneous translated hashing

Detecting anomalies in dynamic rating data: a robust probabilistic model for rating evolution

Reducing the sampling complexity of topic models

CatchSync: catching synchronized behavior in large directed graphs

Provable deterministic leverage score sampling

Exploiting geographic dependencies for real estate appraisal: a mutual perspective of ranking and clustering

Modeling delayed feedback in display advertising

Entity profiling with varying source reliabilities

Profit-maximizing cluster hires

MMRate: inferring multi-aspect diffusion networks with multi-pattern cascades

Fast influence-based coarsening for large networks

Focused clustering and outlier detection in large attributed graphs

On the permanence of vertices in network communities

Balanced graph edge partition

Who are experts specializing in landscape photography?: analyzing topic-specific authority on content sharing services

Algorithms for interpretable machine learning

Novel geospatial interpolation analytics for general meteorological measurements

Computational epidemiology

A cost-effective recommender system for taxi drivers

Good-enough brain model: challenges, algorithms and discoveries in multi-subject experiments

Clinical risk prediction with multilinear sparse logistic regression

User effort minimization through adaptive diversification

SMVC: semi-supervised multi-view clustering in subspace projections

Class-distribution regularized consensus maximization for alleviating overfitting in model combination

Supervised deep learning with auxiliary networks

Utilizing temporal patterns for estimating uncertainty in interpretable early decision making

Time-varying learning and content analytics via sparse factor analysis

Active collaborative permutation learning

Factorized sparse learning models with interpretable high order feature interactions

Knowledge vault: a web-scale approach to probabilistic knowledge fusion

Scaling out big data missing value imputations: pythia vs. godzilla

DeepWalk: online learning of social representations

Personalized search result diversification via structured learning

Matching users and items across domains to improve the recommendation quality

Product selection problem: improve market share by learning consumer behavior

Dynamics of news events and social media reaction

Mobile app recommendations with security and privacy awareness

Semantic visualization for spherical representation

Towards scalable critical alert mining

Networked bandits with disjoint linear payoffs

Open question answering over curated and extracted knowledge bases

On social event organization

Stability of influence maximization

Minimizing seed set selection with probabilistic coverage guarantee in a social network

Inside the atoms: ranking on a network of networks

The interplay between dynamics and networks: centrality, communities, and cheeger inequality

Data science through the lens of social science

Targeting direct cash transfers to the extremely poor

Management and analytic of biomedical big data with cloud-based in-memory database and dynamic querying: a hands-on experience with real-world data

Dual beta process priors for latent cluster discovery in chronic obstructive pulmonary disease

Information environment security

Scalable hands-free transfer learning for online advertising

The recommender problem revisited: morning tutorial

Big data for social good

Correlating events with time series for incident diagnosis

Correlation clustering: from theory to practice

Bringing data science to the speakers of every language

Proactive workflow modeling by stochastic processes with application to healthcare operation and management

Activity ranking in LinkedIn feed

Network mining and analysis for social applications

Budget pacing for targeted online advertisements at LinkedIn

Sampling for big data: a tutorial

Large scale predictive modeling for micro-simulation of 3G air interface load

Statistically sound pattern discovery

Unveiling clusters of events for alert and incident management in large-scale enterprise it

Recommendation in social media: recent advances and new frontiers

Style in the long tail: discovering unique interests with latent variable models in large scale social E-commerce

Corporate residence fraud detection

Modeling mass protest adoption in social network communities using geometric brownian motion

Shallow semantic parsing of product offering titles (for better automatic hyperlink insertion)

A case study: privacy preserving release of spatio-temporal density in paris

Scalable near real-time failure localization of data center networks

Improving management of aquatic invasions by integrating shipping network, ecological, and environmental data: data mining for social good

FoodSIS: a text mining system to improve the state of food safety in singapore

A hazard based approach to user return time prediction

Predicting employee expertise for talent management in the enterprise

Applying data mining techniques to address critical process optimization needs in advanced manufacturing

EARS (earthquake alert and report system): a real time decision support system for earthquake crisis management

Knock it off: profiling the online storefronts of counterfeit merchandise

Up next: retrieval methods for large scale related video suggestion

Identifying tourists from public transport commuters

Spatially embedded co-offence prediction using supervised learning

'Beating the news' with EMBERS: forecasting civil unrest using open source indicators

LASTA: large scale topic assignment on multiple social networks

New algorithms for parking demand management and a city-scale deployment

Reducing gang violence through network influence based targeting of social programs

Modeling impression discounting in large-scale recommender systems

ISIS: a networked-epidemiology based pervasive web app for infectious disease pandemic planning and response

Seven rules of thumb for web site experimenters

Log-based predictive maintenance

Automated hypothesis generation based on mining scientific literature

A system to grade computer programming skills using machine learning

An empirical study of reserve price optimisation in real-time bidding

Large-scale high-precision topic modeling on twitter

Large scale visual recommendations from street fashion images

We know what you want to buy: a demographic-based system for product recommendation on microblogs

Modeling professional similarity by mining professional career trajectories

Filling context-ad vocabulary gaps with click logs

Mining topics in documents: standing on the shoulders of big data

Content Provider	ACM Digital Library
Author	Chen, Zhiyuan Liu, Bing
Abstract	Topic modeling has been widely used to mine topics from documents. However, a key weakness of topic modeling is that it needs a large amount of data (e.g., thousands of documents) to provide reliable statistics to generate coherent topics. However, in practice, many document collections do not have so many documents. Given a small number of documents, the classic topic model LDA generates very poor topics. Even with a large volume of data, unsupervised learning of topic models can still produce unsatisfactory results. In recently years, knowledge-based topic models have been proposed, which ask human users to provide some prior domain knowledge to guide the model to produce better topics. Our research takes a radically different approach. We propose to learn as humans do, i.e., retaining the results learned in the past and using them to help future learning. When faced with a new task, we first mine some reliable (prior) knowledge from the past learning/modeling results and then use it to guide the model inference to generate more coherent topics. This approach is possible because of the big data readily available on the Web. The proposed algorithm mines two forms of knowledge: must-link (meaning that two words should be in the same topic) and cannot-link (meaning that two words should not be in the same topic). It also deals with two problems of the automatically mined knowledge, i.e., wrong knowledge and knowledge transitivity. Experimental results using review documents from 100 product domains show that the proposed approach makes dramatic improvements over state-of-the-art baselines.
Starting Page	1116
Ending Page	1125
Page Count	10
File Format	PDF MP4
ISBN	9781450329569
DOI	10.1145/2623330.2623622
Language	English
Publisher	Association for Computing Machinery (ACM)
Publisher Date	2014-08-24
Publisher Place	New York
Access Restriction	Subscribed
Subject Keyword	Lifelong learning Topic model Opinion aspect extraction
Content Type	Audio Text
Resource Type	Article

Central Library (ISO-9001:2015 Certified)
Indian Institute of Technology Kharagpur
Kharagpur, West Bengal, India | PIN - 721302

See location in the Map
03222 282435
Mail: support@ndl.gov.in

Sl.	Authority	Responsibilities	Communication Details
1	Ministry of Education (GoI), Department of Higher Education	Sanctioning Authority	https://www.education.gov.in/ict-initiatives
2	Indian Institute of Technology Kharagpur	Host Institute of the Project: The host institute of the project is responsible for providing infrastructure support and hosting the project	https://www.iitkgp.ac.in
3	National Digital Library of India Office, Indian Institute of Technology Kharagpur	The administrative and infrastructural headquarters of the project	Dr. B. Sutradhar bsutra@ndl.gov.in
4	Project PI / Joint PI	Principal Investigator and Joint Principal Investigators of the project	Dr. B. Sutradhar bsutra@ndl.gov.in Prof. Saswat Chakrabarti will be added soon
5	Website/Portal (Helpdesk)	Queries regarding NDLI and its services	support@ndl.gov.in
6	Contents and Copyright Issues	Queries related to content curation and copyright issues	content@ndl.gov.in
7	National Digital Library of India Club (NDLI Club)	Queries related to NDLI Club formation, support, user awareness program, seminar/symposium, collaboration, social media, promotion, and outreach	clubsupport@ndl.gov.in
8	Digital Preservation Centre (DPC)	Assistance with digitizing and archiving copyright-free printed books	dpc@ndl.gov.in
9	IDR Setup or Support	Queries related to establishment and support of Institutional Digital Repository (IDR) and IDR workshops	idr@ndl.gov.in