Loading...
Please wait, while we are loading the content...
Similar Documents
Similarity Search for Sequences of Diierent Lengths: Matching and Indexing
| Content Provider | Semantic Scholar |
|---|---|
| Author | Yazdani, Nasser Bozkaya, Tolga |
| Copyright Year | 1998 |
| Abstract | Similarity match queries are common and important in many database applications with sequence data, such as text databases, genetics, time series, scientiic experiments, etc.. In this paper, we consider the problem of eecient matching and retrieval of sequences of diierent lengths. Most of the previous research in this area concentrates on similarity matching and retrieval of sequences of the same length using the Euclidean distance metric. For similarity matching of sequences, we use a modiied version of the edit distance function. We consider two sequences matching if the majority of the elements in the sequences match. In the matching process, a mapping among non-matching elements is created to check if there are unacceptable deviations among them. This means that two matching sequences should have lengths that are comparable. For eecient retrieval of matching sequences, we propose two methods. The rst method is a feature-based indexing mechanism used to nd out a small collection of sequences which are matching candidates with a given query sequence from a large data set. Like all other feature-based indexing methods, our method maps each sequence into a point in the K dimensional space, where K is the number of extracted features for the sequence. Lengths and moments (mean and variance) of sequences are used as features. The second method is an indexing scheme which is totally based on lengths and relative distances between sequences. We use vp-trees as the underlying distance-based index structure in this method. Both methods work in two phases: hypothesization and veriication. In the hypothesization phase, data sequences that are distant to the query sequence are ltered out eeciently. In the veriication phase, the candidate sequences that were not eliminated in the hypothesization phase are checked if they actually match with the query sequence. In presenting our methods, we put more emphasis on numerical sequences from scientiic experiments. |
| File Format | PDF HTM / HTML |
| Alternate Webpage(s) | http://erciyes.ces.cwru.edu/ozsoy/links/sequence.ps |
| Language | English |
| Access Restriction | Open |
| Content Type | Text |
| Resource Type | Article |