Loading...
Please wait, while we are loading the content...
Similar Documents
Density-based spatial clustering methods for very large datasets
| Content Provider | Semantic Scholar |
|---|---|
| Author | Wang, Xin |
| Copyright Year | 2006 |
| Abstract | Spatial data mining, or knowledge discovery in spatial databases, refers to the extraction from spatial databases of implicit knowledge, of spatial relations, or of other patterns that are not explicitly stored. Finding clusters in spatial data is an active research area in spatial data mining. The first part of this thesis proposes a novel density-based spatial clustering method called DBRS. The algorithm can identify clusters of widely varying shapes, clusters that depend on non-spatial attributes, and approximate clusters in very large databases. DBRS achieves these results by repeatedly picking an unclassified point at random and examining its neighborhood. If the neighborhood is sparsely populated or the purity of the points in the neighborhood is too low, the point is classified as noise. Otherwise, if any point in the neighborhood is part of a known cluster, this neighborhood is joined to that cluster. If neither of these two possibilities applies, a new cluster is begun with this neighborhood. The experimental results show that DBRS is not only efficient but also can produce high-quality clusters. The second part of this thesis develops a constraint-based spatial clustering algorithm dealing with constraints due to obstacles and facilitators. Typically, a clustering task consists of separating a set of objects into different groups according to a measure of goodness. A common measure of goodness is Euclidean distance (i.e. straight-line distance). However, in many applications, the use of Euclidean distance has a weakness because of the presence of obstacles and facilitators. An obstacle is a physical object that obstructs the reachability among the data objects, and a facilitator is also a physical object that connects distant data objects or connects data objects across obstacles. Handling these constraints can lead to effective and fruitful data mining by capturing application semantics. We extend DBRS to a new spatial clustering method, called DBRS+, which can handle any combination of intersecting obstacles and facilitators. DBRS+ is simple and efficient. Without any preprocessing, the constraints are handled during the clustering process. DBRS+ has been empirically evaluated using synthetic and real data sets. The third part of this thesis emphasizes that domain knowledge can play a key role in spatial clustering. We propose a framework called ONTO_CLUST to combine the domain ontology with the clustering algorithms. In the framework, we show that the clustering process should occur at the knowledge level so that users can identify their goals and understand the results. In ONTO_CLUST, the spatial clustering ontology component is used when identifying the clustering problem and the relevant data. Users' goals are used to search in the ontology. The results of these queries identify the proper clustering methods and the appropriate datasets. Based on these results, clustering is conducted. The clustering result can be used for statistical analysis or it can be interpreted using the ontology. The final result is returned to the user in an understandable format. |
| File Format | PDF HTM / HTML |
| Alternate Webpage(s) | https://freshtea.files.wordpress.com/2009/03/xinwangthesis.pdf |
| Language | English |
| Access Restriction | Open |
| Content Type | Text |
| Resource Type | Article |