NDLI: Similarity Search for Sequences of Diierent Lengths: Matching and Indexing

Please wait, while we are loading the content...

Similarity Search for Sequences of Diierent Lengths: Matching and Indexing

Content Provider	Semantic Scholar
Author	Yazdani, Nasser Bozkaya, Tolga
Copyright Year	1998
Abstract	Similarity match queries are common and important in many database applications with sequence data, such as text databases, genetics, time series, scientiic experiments, etc.. In this paper, we consider the problem of eecient matching and retrieval of sequences of diierent lengths. Most of the previous research in this area concentrates on similarity matching and retrieval of sequences of the same length using the Euclidean distance metric. For similarity matching of sequences, we use a modiied version of the edit distance function. We consider two sequences matching if the majority of the elements in the sequences match. In the matching process, a mapping among non-matching elements is created to check if there are unacceptable deviations among them. This means that two matching sequences should have lengths that are comparable. For eecient retrieval of matching sequences, we propose two methods. The rst method is a feature-based indexing mechanism used to nd out a small collection of sequences which are matching candidates with a given query sequence from a large data set. Like all other feature-based indexing methods, our method maps each sequence into a point in the K dimensional space, where K is the number of extracted features for the sequence. Lengths and moments (mean and variance) of sequences are used as features. The second method is an indexing scheme which is totally based on lengths and relative distances between sequences. We use vp-trees as the underlying distance-based index structure in this method. Both methods work in two phases: hypothesization and veriication. In the hypothesization phase, data sequences that are distant to the query sequence are ltered out eeciently. In the veriication phase, the candidate sequences that were not eliminated in the hypothesization phase are checked if they actually match with the query sequence. In presenting our methods, we put more emphasis on numerical sequences from scientiic experiments.
File Format	PDF HTM / HTML
Alternate Webpage(s)	http://erciyes.ces.cwru.edu/ozsoy/links/sequence.ps
Language	English
Access Restriction	Open
Content Type	Text
Resource Type	Article

Central Library (ISO-9001:2015 Certified)
Indian Institute of Technology Kharagpur
Kharagpur, West Bengal, India | PIN - 721302

See location in the Map
03222 282435
Mail: support@ndl.gov.in

Sl.	Authority	Responsibilities	Communication Details
1	Ministry of Education (GoI), Department of Higher Education	Sanctioning Authority	https://www.education.gov.in/ict-initiatives
2	Indian Institute of Technology Kharagpur	Host Institute of the Project: The host institute of the project is responsible for providing infrastructure support and hosting the project	https://www.iitkgp.ac.in
3	National Digital Library of India Office, Indian Institute of Technology Kharagpur	The administrative and infrastructural headquarters of the project	Dr. B. Sutradhar bsutra@ndl.gov.in
4	Project PI / Joint PI	Principal Investigator and Joint Principal Investigators of the project	Dr. B. Sutradhar bsutra@ndl.gov.in Prof. Saswat Chakrabarti will be added soon
5	Website/Portal (Helpdesk)	Queries regarding NDLI and its services	support@ndl.gov.in
6	Contents and Copyright Issues	Queries related to content curation and copyright issues	content@ndl.gov.in
7	National Digital Library of India Club (NDLI Club)	Queries related to NDLI Club formation, support, user awareness program, seminar/symposium, collaboration, social media, promotion, and outreach	clubsupport@ndl.gov.in
8	Digital Preservation Centre (DPC)	Assistance with digitizing and archiving copyright-free printed books	dpc@ndl.gov.in
9	IDR Setup or Support	Queries related to establishment and support of Institutional Digital Repository (IDR) and IDR workshops	idr@ndl.gov.in