Loading...
Please wait, while we are loading the content...
Similar Documents
Alternate representation of distance matrices for characterization of protein structure ∗.
Content Provider | CiteSeerX |
---|---|
Abstract | The most suitable method for the automated classification of protein structures remains an open problem in computational biology. In order to classify a protein structure with any accuracy, an effective representation must be chosen. Here we present two methods of representing protein structure. One involves representing the distances between the Cα atoms of a protein as a two-dimensional matrix and creating a model of the resulting surface with Zernike polynomials. The second uses a wavelet-based approach. We convert the distances between a protein’s Cα atoms into a one-dimensional signal which is then decomposed using a discrete wavelet transformation. Using the Zernike coefficients and the approximation coefficients of the wavelet decomposition as feature vectors, we test the effectiveness of our representation with two different classifiers on a dataset of more than 600 proteins taken from the 27 mostpopulated SCOP folds. We find that the wavelet decomposition greatly outperforms the Zernike model.With the wavelet representation, we achieve an accuracy of approximately 56%, roughly 12 % higher than results reported on a similar, but less-challenging dataset. In addition, we can couple our structure-based feature vectors with several sequence-based properties to increase accuracy another 5-7%. Finally, we use a multi-stage classification strategy on the combined features to increase performance to 78%, an improvement in accuracy of more than 15-20 % and 34 % over the highest reported sequence-based and structure-based classification results, respectively. 1 |
File Format | |
Access Restriction | Open |
Subject Keyword | Two-dimensional Matrix Feature Vector Several Sequence-based Property Zernike Coefficient Wavelet Decomposition Scop Fold Suitable Method Distance Matrix Zernike Model Effective Representation Automated Classification Protein Atom Zernike Polynomial Open Problem Combined Feature Wavelet-based Approach Computational Biology Discrete Wavelet Transformation Protein Structure Structure-based Feature Vector One-dimensional Signal Different Classifier Less-challenging Dataset Structure-based Classification Result Alternate Representation Multi-stage Classification Strategy Approximation Coefficient Wavelet Representation |
Content Type | Text |