NDLI: Mining Publication Records on Personal Publication Web Pages Based on Conditional Random Fields

Please wait, while we are loading the content...

WI 2012 Cover Art

WI 2012 Title Page i

WI 2012 Title Page iii

WI 2012 Copyright Page

WI 2012 Welcome Message from Conference Chairs and Program Chairs

WI 2012 Sponsors

Disambiguating Implicit Temporal Queries by Clustering Top Relevant Dates in Web Snippets

Unsupervised Feature Selection with Feature Clustering

Event Duration Detection on Microblogging

A Comparative Study of Cross-Lingual Sentiment Classification

The Retrieval of Important News Stories by Influence Propagation among Communities and Categories

An MCL-Based Text Mining Approach for Namesake Disambiguation on the Web

Construction of Chinese A-shares Network Using Latent Dirichlet Allocation

A Fast and Accurate Method for Bilingual Opinion Lexicon Extraction

TaxoLearn: A Semantic Approach to Domain Taxonomy Learning

Accuracy vs. Speed: Scalable Entity Coreference on the Semantic Web with On-the-Fly Pruning

RCQ-ACS: RDF Chain Query Optimization Using an Ant Colony System

Entity Disambiguation with Freebase

An Empirical Analysis of Semantic Techniques Applied to a Network Management Classification Problem

Approximating Linear Order Inference in OWL 2 DL by Horn Compilation

Term Weighting Schemes for Emerging Event Detection

A Generalized Links and Text Properties Based Forum Crawler

KGRAM Versatile Inference and Query Engine for the Web of Linked Data

Automatic Extraction of Blog Post from Diverse Blog Pages

Learning User Preference Patterns for Top-N Recommendations

Ranking Text Documents Based on Conceptual Difficulty Using Term Embedding and Sequential Discourse Cohesion

Towards Topic Trend Prediction on a Topic Evolution Model with Social Connection

Fusing Text and Frienships for Location Inference in Online Social Networks

Featured Tweet Search: Modeling Time and Social Influence for Microblog Retrieval

Sentiment Analysis of Turkish Political News

Towards the Optimal Discriminant Subspace

Polarity Analysis for Food and Disease Relationships

Inferring User Interest Using Familiarity and Topic Similarity with Social Neighbors in Facebook

Using Patterns Co-occurrence Matrix for Cleaning Closed Sequential Patterns for Text Mining

Selective Behavior in Online Social Networks

Tweets Beget Propinquity: Detecting Highly Interactive Communities on Twitter Using Tweeting Links

Community Extracting Using Intersection Graph and Content Analysis in Complex Network

Predicting Stock Market Using Online Communities Raw Web Traffic

Hierarchical Clustering Based on Hyper-edge Similarity for Community Detection

Analysis of Discussion Page in Wikipedia Based on User's Discussion Capability

Biological Mutualistic Models Applied to Study Open Source Software Development

A Website Content Analysis Approach Based on Keyword Similarity Analysis

Serendipitous Personalized Ranking for Top-N Recommendation

Effectively Detecting Content Spam on the Web Using Topical Diversity Measures

Document Re-ranking Using Partial Social Tagging

A Double-Ranking Strategy for Long-Tail Product Recommendation

Social Recommendations for Location-Based Services

Content-Based Semantic Tag Ranking for Recommendation

Brand-Related Events Detection, Classification and Summarization on Twitter

Intuitive Topic Discovery by Incorporating Word-Pair's Connection Into LDA

Location Comparison through Geographical Topics

Mining Publication Records on Personal Publication Web Pages Based on Conditional Random Fields

Verb Oriented Sentiment Classification

Mining Criminal Networks from Chat Log

Cognitive Resource-Aware Adaptive Web Service Binding and Scheduling

Unsupervised Emotion Detection from Text Using Semantic and Syntactic Relations

Toward the Design of a Recommender System: Visual Clustering and Detecting Community Structure in a Web Usage Network

Answering Typicality Query Based on Automatically Prototype Construction

A Ubiquitous Image Tagging System Using User Context

NE-Rank: A Novel Graph-Based Keyphrase Extraction in Twitter

Malicious URL Detection Based on Kolmogorov Complexity Estimation

Batch-Mode Active Learning with Semi-supervised Cluster Tree for Text Classification

Local Tangent Distances for Classification Problems

Context Aware Named Entity Disambiguation

Understanding the Regularity and Variability of Human Mobility from Geo-trajectory

E-rank: A Structural-Based Similarity Measure in Social Networks

Link Prediction Using BenefitRanks in Weighted Networks

A Comparison Study for Novelty Control Mechanisms Applied to Web News Stories

Personalized News Recommendation Based on Collaborative Filtering

User Interest and Topic Detection for Personalized Recommendation

Detecting Places of Interest Using Social Media

Rating Prediction by Correcting User Rating Bias

Predicting Best Responder in Community Question Answering Using Topic Model Method

Conceptualization Effects on MEDLINE Documents Classification Using Rocchio Method

Keyword Proximity Search over Large and Complex RDF Database

Support for Video Hosting Service Users Using Folksonomy and Social Annotation

A Modularity Maximization Algorithm for Community Detection in Social Networks with Low Time Complexity

Semantic Formalization of Cross-Site User Browsing Behavior

Flexible Algorithm Selection Framework for Large Scale Metalearning

Extraction and Compilation of Events and Sub-events from Twitter

The Mechanism of Information Resonance in Social Media

Building Up a Class Hierarchy with Properties from Japanese Wikipedia

Enhancing LOD Complex Query Building with Context

Semantic Labelling for Document Feature Patterns Using Ontological Subjects

From DBpedia to Wikipedia: Filling the Gap by Discovering Wikipedia Conventions

Analysing the Use of Ontologies Based on Usage Network

COMMA: A Result-Oriented Composite Autocompletion Method for E-marketplaces

Statistical and Structural Analysis of Web-Based Collaborative Knowledge Bases Generated from Wiki Encyclopedia

Latent Business Networks Mining: A Probabilistic Generative Model

A Context-Aware Framework for Detecting Unfair Ratings in an Unknown Real Environment

An Ontology-Based Mining of Consumer Feedbacks Using Fuzzy Reasoning

An Intelligent System for Retrieving Economic Information from Corporate Websites

Mining Publication Records on Personal Publication Web Pages Based on Conditional Random Fields

Content Provider	ACM Digital Library
Author	Lee, Hahn-Ming Lin, Ya-Huei Ho, Jan-Ming Chung, Jen-Ming
Abstract	A publication record denotes a list of semi-structured citation string of publications of a research institute or an individual researcher. Publication records are integrated into a digital library to become an important knowledge base which in turn enables a variety of applications. A publication record is usually found among other information on a publication Web page (or publication page for short). It is thus an interesting problem to extract publication record from these Web pages. The problem is difficult due to several reasons including the flexibility in formatting the metadata of a publication into a semi-structured citation string and expressing the citation string into its visual presentation in HTML. Furthermore, two citation strings with similar visual presentation on the same Web page may have different HTML constructs. In this paper, we present a content analysis approach based on Conditional Random Fields and data region boundary analysis to automatically extract citation record on a publication page. Experimental results show that our method performs well on a benchmark containing manually crafted publication Web pages. The precision, recall, and F-measure are 82.5%, 87.6%, and 85.0% respectively. This is an improvement over previous results.
Starting Page	319
Ending Page	326
Page Count	8
File Format	PDF
ISBN	9780769548807
Language	English
Publisher	Association for Computing Machinery (ACM)
Publisher Date	2012-12-04
Access Restriction	Subscribed
Subject Keyword	Publication record extraction conditional random fields data region boundary analysis
Content Type	Text
Resource Type	Article

Central Library (ISO-9001:2015 Certified)
Indian Institute of Technology Kharagpur
Kharagpur, West Bengal, India | PIN - 721302

See location in the Map
03222 282435
Mail: support@ndl.gov.in

Sl.	Authority	Responsibilities	Communication Details
1	Ministry of Education (GoI), Department of Higher Education	Sanctioning Authority	https://www.education.gov.in/ict-initiatives
2	Indian Institute of Technology Kharagpur	Host Institute of the Project: The host institute of the project is responsible for providing infrastructure support and hosting the project	https://www.iitkgp.ac.in
3	National Digital Library of India Office, Indian Institute of Technology Kharagpur	The administrative and infrastructural headquarters of the project	Dr. B. Sutradhar bsutra@ndl.gov.in
4	Project PI / Joint PI	Principal Investigator and Joint Principal Investigators of the project	Dr. B. Sutradhar bsutra@ndl.gov.in Prof. Saswat Chakrabarti will be added soon
5	Website/Portal (Helpdesk)	Queries regarding NDLI and its services	support@ndl.gov.in
6	Contents and Copyright Issues	Queries related to content curation and copyright issues	content@ndl.gov.in
7	National Digital Library of India Club (NDLI Club)	Queries related to NDLI Club formation, support, user awareness program, seminar/symposium, collaboration, social media, promotion, and outreach	clubsupport@ndl.gov.in
8	Digital Preservation Centre (DPC)	Assistance with digitizing and archiving copyright-free printed books	dpc@ndl.gov.in
9	IDR Setup or Support	Queries related to establishment and support of Institutional Digital Repository (IDR) and IDR workshops	idr@ndl.gov.in