Loading...
Please wait, while we are loading the content...
Similar Documents
Semi-supervised Gaussian Mixture Models Clustering Algorithm Based on Immune Clonal Selection
Content Provider | Semantic Scholar |
---|---|
Author | Huang, Wenlong Wang, Xiaodan |
Copyright Year | 2016 |
Abstract | Semi-supervised Clustering with constraints is an active area of machine learning and data mining research.. Shental used the Expectation Maximization (EM) procedure to handle semi-supervised Gaussian Mixture Models (GMM) estimation, in which positive and negative constraints are incorporated with to improve clustering results. However the conventional EM algorithm only produces solutions that are locally optimal, and thus may not achieve the globally optimal solution, and it is sensitive to initialization, moreover, the number of components of mixture model must be known in advance. This paper introduces the artificial immune clonal selection algorithm into semi-supervised GMM-based clustering techniques, where the EM algorithm is incorporated with the ideas of a clonal selection algorithm. The new algorithm overcomes the various problems associated with the traditional EM algorithm. It can improve the effectiveness in estimating the parameters and determining simultaneously the optimal number of clusters automatically. The experimental results illustrate the proposed clustering algorithm provides significantly better clustering results. Introduction One of the most interesting techniques in pattern recognition, data mining and knowledge discovery is clustering. Semi-supervised clustering approach uses additional constraints to guide the clustering process, which has attracted significant research effort in machine learning and data mining communities. Semi-supervised clustering is usually performed by imposing some constraints to an existing clustering method. As K-means algorithm is a popular technique in data clustering for its simplicity and ease implementation, several research work has been done to take into account limited user supervision with K-means. Basu et al. utilized a small number of labeled samples to generate initial centroids for K-means. Wang and Li proposed an active semi-supervised spectral clustering based on actively selecting inform. Abdullin et al. proposed mutual semi-supervision clustering for heterogeneous data. Chen et al. proposed a semi-supervised approach by using spectral clustering. Ahmed et al. proposed a new semi-supervised hierarchical active clustering based on ranking constraints for analysts groupization. Fig.1 shows a simple example of the role pairwise constraints can have when used for semi-supervised clustering. Any of the partitions (b) and (c) of the data items in (a) can be solutions to an unsupervised clustering algorithm, and for some algorithms the choice will depend on random factors (such as the initialization of the prototypes). By providing pairwise constraints like the ones pictured in (d), the user can guide clustering to the solution he prefers. Among various clustering methods, GMM are one of the more widely used methods for unsupervised clustering of data, where clusters are approximated by Gaussian distributions, often performing better than the hard-partitioning clustering algorithms such as K-means, as well as hierarchical clustering methods. However, there are some challenges associated with mixture modeling, for example, estimation of the parameters of the mixture models and choosing the optimal number of components. Recent literature shows better performance of these methods with respect to totally unsupervised ones even with a small amount of side information. Shental et al. propose a constrained Expectation-Maximization procedure that fits a Gaussian mixture model to a data set. 4th National Conference on Electrical, Electronics and Computer Engineering (NCEECE 2015) © 2016. The authors Published by Atlantis Press 1204 They provide an EM algorithm for using only must-link constraints, and a generalized EM algorithm for use with both must-link and cannot-link constraints. As we know, the standard EM has some inherent drawbacks, such as requirement the number of components in advance, sensitivity to initialization and possible convergence to the boundary of the parameter space. In view of such conditions, a novel global search mechanism based the clonal selection algorithm into semi-supervised GMM clustering techniques is proposed in this paper. The novel algorithm can improve the effectiveness in estimating the parameters and determine the optimal number of clusters automatically. Fig.1 Influence of pairwise constraints on clustering: (a) data items to cluster, (b) and (c) alternative potential solutions for unsupervised clustering, (d) specification of pairwise constraints (red continuous line for the must-link and green dashed line for the cannot-link), (e) solution obtained by semi-supervised clustering using these constraints. Semi-supervised GMM Standard EM Algorithm GMM are often used in generative clustering algorithms, where each Gaussian source is interpreted as a different cluster. A GMM is usually computed in an unsupervised manner using the Expectation Maximization algorithm. Shental et al. present a closed form EM algorithm for handling positive constraints, and a generalized EM algorithm using a Markov network for the incorporation of negative constraints. First, consider the EM algorithm for a standard mixture with K components. For a set of samples ( ) N X X X X , , , ~ 2 1 = , assumed to be generated independently, we define the potential as the negative complete data log likelihood, that is, ( ) ( ) [ ] ∑∑ = = − = Θ N |
File Format | PDF HTM / HTML |
Alternate Webpage(s) | https://download.atlantis-press.com/article/25847082.pdf |
Alternate Webpage(s) | https://doi.org/10.2991/nceece-15.2016.214 |
Language | English |
Access Restriction | Open |
Content Type | Text |
Resource Type | Article |