NDLI: Applications of corpus-based semantic similarity and word segmentation to database schema matching

Content Provider	Springer Nature Link
Author	Islam, Aminul Inkpen, Diana Kiringa, Iluju
Copyright Year	2007
Abstract	In this paper, we present a method for database schema matching: the problem of identifying elements of two given schemas that correspond to each other. Schema matching is useful in e-commerce exchanges, in data integration/warehousing, and in semantic web applications. We first present two corpus-based methods: one method is for determining the semantic similarity of two target words and the other is for automatic word segmentation. Then we present a name-based element-level database schema matching method that exploits both the semantic similarity and the word segmentation methods. Our word similarity method uses pointwise mutual information (PMI) to sort lists of important neighbor words of two target words; the words which are common in both lists are selected and their PMI values are aggregated to calculate the relative similarity score. Our word segmentation method uses corpus type frequency information to choose the type with maximum length and frequency from “desegmented” text. It also uses a modified forward–backward matching technique using maximum length frequency and entropy rate if any non-matching portions of the text exist. Finally, we exploit both the semantic similarity and the word segmentation methods in our proposed name-based element-level schema matching method. This method uses a single property (i.e., element name) for schema matching and nevertheless achieves a measure score that is comparable to the methods that use multiple properties (e.g., element name, text description, data instance, context description). Our schema matching method also uses normalized and modified versions of the longest common subsequence string matching algorithm with weight factors to allow for a balanced combination. We validate our methods with experimental studies, the results of which suggest that these methods can be a useful addition to the set of existing methods.
Starting Page	1293
Ending Page	1320
Page Count	28
File Format	PDF
ISSN	10668888
Journal	The VLDB Journal
Volume Number	17
Issue Number	5
e-ISSN	0949877X
Language	English
Publisher	Springer-Verlag
Publisher Date	2007-10-18
Publisher Place	Berlin, Heidelberg
Access Restriction	One Nation One Subscription (ONOS)
Subject Keyword	Database schema matching Semantic similarity Word segmentation Corpus-based methods Database Management
Content Type	Text
Resource Type	Article
Subject	Information Systems Hardware and Architecture

Sl.	Authority	Responsibilities	Communication Details
1	Ministry of Education (GoI), Department of Higher Education	Sanctioning Authority	https://www.education.gov.in/ict-initiatives
2	Indian Institute of Technology Kharagpur	Host Institute of the Project: The host institute of the project is responsible for providing infrastructure support and hosting the project	https://www.iitkgp.ac.in
3	National Digital Library of India Office, Indian Institute of Technology Kharagpur	The administrative and infrastructural headquarters of the project	Dr. B. Sutradhar bsutra@ndl.gov.in
4	Project PI / Joint PI	Principal Investigator and Joint Principal Investigators of the project	Dr. B. Sutradhar bsutra@ndl.gov.in Prof. Saswat Chakrabarti will be added soon
5	Website/Portal (Helpdesk)	Queries regarding NDLI and its services	support@ndl.gov.in
6	Contents and Copyright Issues	Queries related to content curation and copyright issues	content@ndl.gov.in
7	National Digital Library of India Club (NDLI Club)	Queries related to NDLI Club formation, support, user awareness program, seminar/symposium, collaboration, social media, promotion, and outreach	clubsupport@ndl.gov.in
8	Digital Preservation Centre (DPC)	Assistance with digitizing and archiving copyright-free printed books	dpc@ndl.gov.in
9	IDR Setup or Support	Queries related to establishment and support of Institutional Digital Repository (IDR) and IDR workshops	idr@ndl.gov.in

Schema matching prediction with applications to data source discovery and dynamic ensembling

Efficient management of uncertainty in XML schema matching

Efficient distributed subgraph similarity matching

ETuner: tuning schema matching software using synthetic scenarios

Algorithmic Computation and Approximation of Semantic Similarity

Schema mediation for large-scale semantic data sharing

Assisting web search using query suggestion based on word similarity measure and query modification patterns

Constraint Preserving Transformation from Relational Schema to XML Schema

Subgraph similarity maximal all-matching over a large uncertain graph

Applications of corpus-based semantic similarity and word segmentation to database schema matching

Similar Documents

Schema matching prediction with applications to data source discovery and dynamic ensembling

Efficient management of uncertainty in XML schema matching

Efficient distributed subgraph similarity matching

ETuner: tuning schema matching software using synthetic scenarios

Algorithmic Computation and Approximation of Semantic Similarity

Schema mediation for large-scale semantic data sharing

Assisting web search using query suggestion based on word similarity measure and query modification patterns

Constraint Preserving Transformation from Relational Schema to XML Schema

Subgraph similarity maximal all-matching over a large uncertain graph

Applications of corpus-based semantic similarity and word segmentation to database schema matching