NDLI: A comparison of layout based bibliographic metadata extraction techniques

Please wait, while we are loading the content...

Desiderata for research in web intelligence, mining and semantics

Crowdsourcing semantic data management: challenges and opportunities

Agent-based ontology alignment: basics, applications, theoretical foundations, and demonstration

Designing privacy-aware social networks: a multi-agent approach

Representation models for text classification: a comparative analysis over three web document types

Automatic reasoner selection using machine learning

Unsupervised generation of data mining features from linked open data

Deriving a statistical syntactic parsing from a treebank

Advanced agent discovery services

Building and managing reputation in the environment of Chinese e-commerce: a case study on Taobao

A novel query rewriting mechanism for semantically interlinking clinical research with electronic health records

Visualization of the European environmental data

A game-theoretic approach to cooperation in multi-agent systems

"Closing" some doors for the open semantic web

Making your semantic application addictive: incentivizing users!

A density-based approach for mining overlapping communities from social network interactions

Client- and server-side revisitation prediction with SUPRA

Predicate trees: a tool for descriptive subgraph extraction

Automatic traceability acquisition framework

Sentiment analysis: what is the end user's requirement?

Adaptable swarm intelligence framework

Exploring manuscripts: sharing ancient wisdoms across the semantic web

Genetically optimizing query expansion for retrieving activities from the web

A multi-agent supervising system for smart environments

Text stream processing

Influence patterns in topic communities of social media

Mining feature-opinion pairs and their reliability scores from web opinion sources

Semantic event processing in ENVISION

User profile based activities in flexible processes

Evaluating PageRank methods for structural sense ranking in labeled tree data

Distributed distance matrix generator based on agents

Technological foundations of the current blogosphere

Information retrieval and deduplication for tourism recommender sightsplanner

Semantically-enhanced authoring of defeasible logic rule bases in the semantic web

Semantic metadata management in web 2.0

Measuring node importance on Twitter microblogging

Dynamic prediction of forthcoming trends in stock prices from news articles

Towards law-aware semantic cloud policies with exceptions for data integration and protection

Automatic forum analysis: a thorough method of assessing the importance of posts, discussion threads and of users' involvement

A swarm-inspired data center consolidation methodology

A literature-based method to automatically detect learning styles in learning management systems

Intelligent web page retrieval using Wikipedia knowledge

Everyday problems vs. UbiComp: a case study

Agents and knowledge interoperability in the semantic web era

Exploring information diffusion in network of semantically annotated web service interfaces

PostRank: a new algorithm for incremental finding of persian blog representative words

Semantics-based news recommendation

Information flow in a distributed agent-based online auction system

Relevant learning objects extraction based on semantic annotation of documents

A framework for biological event extraction from text

Evolution of ontology potential for generation of rules

On generating large-scale ground truth datasets for the deduplication of bibliographic records

IRISPortal: a semantic portal for industrial risk cases management

FOREX application for BlackBerry device

A comparison of layout based bibliographic metadata extraction techniques

Semantic annotation of image processing tools

A collaborative web-based help-system

Features selection from high-dimensional web data using clustering analysis

BOnSAI: a smart building ontology for ambient intelligence

User behavior in online social networks and its implications: a user study

Wrappers for web access logs feature selection

Estimating importance of implicit factors in e-commerce recommender systems

An efficient ensemble classification method based on novel classifier selection technique

Hybrid Method for Computing Word-Pair Similarity based on Web Content

Computationally effective algorithm for information extraction and online review mining

Sentence-level sentiment analysis in Czech

Automated internal web page clustering for improved data extraction

Ranking domain objects by wisdom of web pages

Classification of users by using support vector machines

Classifying Arabic web pages toolkit

Document classification with supervised latent feature selection

Using Bayesian networks theory for aggregated search to XML retrieval

A comparison of layout based bibliographic metadata extraction techniques

Content Provider	ACM Digital Library
Author	Knight, Robert Jack, Kris Hristakeva, Maya Kern, Roman Granitzer, Michael
Abstract	Social research networks such as Mendeley and CiteULike offer various services for collaboratively managing bibliographic metadata. Compared with traditional libraries, metadata quality is of crucial importance in order to create a crowdsourced bibliographic catalog for search and browsing. Artifacts, in particular PDFs which are managed by the users of the social research networks, become one important metadata source and the starting point for creating a homogeneous, high quality, bibliographic catalog. Natural Language Processing and Information Extraction techniques have been employed to extract structured information from unstructured sources. However, given highly heterogeneous artifacts that cover a range of publication styles, stemming from different publication sources, and imperfect PDF processing tools, how accurate are metadata extraction methods in such real-world settings? This paper focuses on answering that question by investigating the use of Conditional Random Fields and Support Vector Machines on real-world data gathered from Mendeley and Linked-Data repositories. We compare style and content features on existing state-of-the-art methods on two newly created real-world data sets for metadata extraction. Our analysis shows that two-stage SVMs provide reasonable performance in solving the challenge of metadata extraction for crowdsourcing bibliographic metadata management.
Starting Page	1
Ending Page	8
Page Count	8
File Format	PDF
ISBN	9781450309158
DOI	10.1145/2254129.2254154
Language	English
Publisher	Association for Computing Machinery (ACM)
Publisher Date	2012-06-13
Publisher Place	New York
Access Restriction	Subscribed
Subject Keyword	Metadata extraction Research papers Layout features Bibliographic metadata
Content Type	Text
Resource Type	Article

Central Library (ISO-9001:2015 Certified)
Indian Institute of Technology Kharagpur
Kharagpur, West Bengal, India | PIN - 721302

See location in the Map
03222 282435
Mail: support@ndl.gov.in

Sl.	Authority	Responsibilities	Communication Details
1	Ministry of Education (GoI), Department of Higher Education	Sanctioning Authority	https://www.education.gov.in/ict-initiatives
2	Indian Institute of Technology Kharagpur	Host Institute of the Project: The host institute of the project is responsible for providing infrastructure support and hosting the project	https://www.iitkgp.ac.in
3	National Digital Library of India Office, Indian Institute of Technology Kharagpur	The administrative and infrastructural headquarters of the project	Dr. B. Sutradhar bsutra@ndl.gov.in
4	Project PI / Joint PI	Principal Investigator and Joint Principal Investigators of the project	Dr. B. Sutradhar bsutra@ndl.gov.in Prof. Saswat Chakrabarti will be added soon
5	Website/Portal (Helpdesk)	Queries regarding NDLI and its services	support@ndl.gov.in
6	Contents and Copyright Issues	Queries related to content curation and copyright issues	content@ndl.gov.in
7	National Digital Library of India Club (NDLI Club)	Queries related to NDLI Club formation, support, user awareness program, seminar/symposium, collaboration, social media, promotion, and outreach	clubsupport@ndl.gov.in
8	Digital Preservation Centre (DPC)	Assistance with digitizing and archiving copyright-free printed books	dpc@ndl.gov.in
9	IDR Setup or Support	Queries related to establishment and support of Institutional Digital Repository (IDR) and IDR workshops	idr@ndl.gov.in