NDLI: Hierarchical Label Propagation and Discovery for Machine Generated Email

Please wait, while we are loading the content...

Large-Scale Deep Learning For Building Intelligent Computer Systems

Who Will Reply to/Retweet This Tweet?: The Dynamics of Intimacy from Online Social Interactions

Beyond Ranking: Optimizing Whole-Page Presentation

Is Mail The Next Frontier In Search And Data Mining?

Portrait of an Online Shopper: Understanding and Predicting Consumer Behavior

Keynote Speaker Bio

Modeling Intransitivity in Matchup and Comparison Data

Scaling up Link Prediction with Ensembles

AMiner: Toward Understanding Big Scholar Data

The Predictive Power of Massive Data about our Fine-Grained Behavior

Information Evolution in Social Networks

WSDM Cup 2016: Entity Ranking Challenge

The Past and Future of Systems for Current Events

Barbara Made the News: Mining the Behavior of Crowds for Time-Aware Learning to Rank

Understanding Offline Political Systems by Mining Online Political Data

Query Understanding for Search on All Devices at WSDM 2016

Mining Complaints to Improve a Product: a Study about Problem Phrase Extraction from User Reviews

Cross-modality Consistent Regression for Joint Visual-Textual Sentiment Analysis of Social Multimedia

Understanding User Attention and Engagement in Online News Reading

Evolution of Privacy Loss in Wikipedia

Crowdsourcing High Quality Labels with a Tight Budget

DiFacto: Distributed Factorization Machines

Serving a Billion Personalized News Feeds

Nonlinear Laplacian for Digraphs and its Applications to Network Analysis

Dynamic Collective Entity Representations for Entity Ranking

Wiggins: Detecting Valuable Information in Dynamic Networks Using Limited Resources

Click Models for Web Search and their Applications to IR: WSDM 2016 Tutorial

TargetAd2016: 2nd International Workshop on Ad Targeting at Scale

Web-scale Multimedia Search for Internet Video Content

How Relevant is the Irrelevant Data: Leveraging the Tagging Data for a Learning-to-Rank Model

Publication Date Prediction through Reverse Engineering of the Web

Project Success Prediction in Crowdfunding Environments

Distributed Balanced Partitioning via Linear Embedding

Querying and Tracking Influencers in Social Streams

Relationship Queries on Extended Knowledge Graphs

WSDM 2016 Workshop on the Ethics of Online Experimentation

Affective Computing of Image Emotion Perceptions

Quantifying Controversy in Social Media

To Suggest, or Not to Suggest for Queries with Diverse Intents: Optimizing Search Result Presentation

Probabilistic Group Recommendation Model for Crowdfunding Domains

Kangaroo: Workload-Aware Processing of Range Data and Range Queries in Hadoop

Centrality-Aware Link Recommendations

Improving Website Hyperlink Structure Using Server Logs

Second Workshop on Search and Exploration of X-Rated Information (SEXI'16): WSDM Workshop Summary

Event Search and Analytics: Detecting Events in Semantically Annotated Corpora for Search & Analytics

Understanding and Identifying Advocates for Political Campaigns on Social Media

Term-by-Term Query Auto-Completion for Mobile Search

Quality Management in Crowdsourcing using Gold Judges Behavior

Feedback Control of Real-Time Display Advertising

Relational Learning with Social Status Analysis

Long-tail Vocabulary Dictionary Extraction from the Web

Understanding Diffusion Processes: Inference and Theory

Exploiting New Sentiment-Based Meta-level Features for Effective Sentiment Analysis

Collaborative Denoising Auto-Encoders for Top-N Recommender Systems

On Obtaining Effort Based Judgements for Information Retrieval

Multi-Score Position Auctions

Equality and Social Mobility in Twitter Discussion Groups

Semantic Documents Relatedness using Concept Graph Representation

E-commerce Product Recommendation by Personalized Promotion and Total Surplus Maximization

Mobile App Tagging

Personalized PageRank Estimation and Search: A Bidirectional Approach

A Semantic Graph based Topic Model for Question Retrieval in Community Question Answering

Multi-view Machines

Learning Distributed Representations of Data in Community Question Answering for Question Retrieval

EgoSet: Exploiting Word Ego-networks and User-generated Ontology for Multifaceted Set Expansion

Detecting Social Media Icebergs by Their Tips: Rumors, Persuasion Campaigns, and Information Needs

CCCF: Improving Collaborative Filtering via Scalable User-Item Co-Clustering

Your Cart tells You: Inferring Demographic Attributes from Purchase Data

Modeling Check-in Preferences with Multidimensional Knowledge: A Minimax Entropy Approach

Transductive Classification on Heterogeneous Information Networks with Edge Betweenness-based Normalization

Inferring Latent Triggers of Purchases with Consideration of Social Effects and Media Advertisements

Extracting Search Query Patterns via the Pairwise Coupled Topic Model

User Modeling in Large Social Networks

On the Efficiency of the Information Networks in Social Media

Reducing Click and Skip Errors in Search Result Ranking

You've got Mail, and Here is What you Could do With It!: Analyzing and Predicting Actions on Email Messages

The Troll-Trust Model for Ranking in Signed Networks

Towards Modelling Language Innovation Acceptance in Online Social Networks

Feature Generation and Selection on the Heterogeneous Graph for Music Recommendation

Modeling and Predicting Learning Behavior in MOOCs

Hierarchical Semi-supervised Classification with Incomplete Class Hierarchies

Hierarchical Label Propagation and Discovery for Machine Generated Email

Multileave Gradient Descent for Fast Online Learning to Rank

Discriminative Learning of Infection Models

Temporal Formation and Evolution of Online Communities

Enforcing k-anonymity in Web Mail Auditing

Representation Learning for Information Diffusion through Social Networks: an Embedded Cascade Model

Mining the Web for Intelligent Problem Solving for Programmers

An Information-Theoretic Approach to Individual Sequential Data Sanitization

Ensemble Models for Data-driven Prediction of Malware Infections

Optimizing Search Interactions within Professional Social Networks

Improving IP Geolocation using Query Logs

Geographic Segmentation via Latent Poisson Factor Model

Hierarchical Label Propagation and Discovery for Machine Generated Email

Content Provider	ACM Digital Library
Author	Yang, Jie Cartright, Marc-Allen Bendersky, Michael Josifovski, Vanja Wendt, James B. Ravi, Sujith Garcia-Pueyo, Lluis Krka, Ivo Saikia, Amitabh Miklos, Balint
Abstract	Machine-generated documents such as email or dynamic web pages are single instantiations of a pre-defined structural template. As such, they can be viewed as a hierarchy of template and document specific content. This hierarchical template representation has several important advantages for document clustering and classification. First, templates capture common topics among the documents, while filtering out the potentially noisy variabilities such as personal information. Second, template representations scale far better than document representations since a single template captures numerous documents. Finally, since templates group together structurally similar documents, they can propagate properties between all the documents that match the template. In this paper, we use these advantages for document classification by formulating an efficient and effective hierarchical label propagation and discovery algorithm. The labels are propagated first over a template graph (constructed based on either term-based or topic-based similarities), and then to the matching documents. We evaluate the performance of the proposed algorithm using a large donated email corpus and show that the resulting template graph is significantly more compact than the corresponding document graph and the hierarchical label propagation is both efficient and effective in increasing the coverage of the baseline document classification algorithm. We demonstrate that the template label propagation achieves more than 91% precision and 93% recall, while increasing the label coverage by more than 11%.
Starting Page	317
Ending Page	326
Page Count	10
File Format	PDF
ISBN	9781450337168
DOI	10.1145/2835776.2835780
Language	English
Publisher	Association for Computing Machinery (ACM)
Publisher Date	2016-02-08
Publisher Place	New York
Access Restriction	Subscribed
Subject Keyword	Hierarchical label propagation Machine-generated email Structural template
Content Type	Text
Resource Type	Article

Central Library (ISO-9001:2015 Certified)
Indian Institute of Technology Kharagpur
Kharagpur, West Bengal, India | PIN - 721302

See location in the Map
03222 282435
Mail: support@ndl.gov.in

Sl.	Authority	Responsibilities	Communication Details
1	Ministry of Education (GoI), Department of Higher Education	Sanctioning Authority	https://www.education.gov.in/ict-initiatives
2	Indian Institute of Technology Kharagpur	Host Institute of the Project: The host institute of the project is responsible for providing infrastructure support and hosting the project	https://www.iitkgp.ac.in
3	National Digital Library of India Office, Indian Institute of Technology Kharagpur	The administrative and infrastructural headquarters of the project	Dr. B. Sutradhar bsutra@ndl.gov.in
4	Project PI / Joint PI	Principal Investigator and Joint Principal Investigators of the project	Dr. B. Sutradhar bsutra@ndl.gov.in Prof. Saswat Chakrabarti will be added soon
5	Website/Portal (Helpdesk)	Queries regarding NDLI and its services	support@ndl.gov.in
6	Contents and Copyright Issues	Queries related to content curation and copyright issues	content@ndl.gov.in
7	National Digital Library of India Club (NDLI Club)	Queries related to NDLI Club formation, support, user awareness program, seminar/symposium, collaboration, social media, promotion, and outreach	clubsupport@ndl.gov.in
8	Digital Preservation Centre (DPC)	Assistance with digitizing and archiving copyright-free printed books	dpc@ndl.gov.in
9	IDR Setup or Support	Queries related to establishment and support of Institutional Digital Repository (IDR) and IDR workshops	idr@ndl.gov.in