TY - JOUR
T1 - Automatic classification of protein structure using the maximum contact map overlap metric
AU - Andonov, Rumen
AU - Djidjev, Hristo
AU - Klau, Gunnar W.
AU - Boudic-Jamin, Mathilde Le
AU - Wohlers, Inken
PY - 2015
Y1 - 2015
N2 - In this work, we propose a new distance measure for comparing two protein structures based on their contact map representations. We show that our novel measure, which we refer to as the maximum contact map overlap (max-CMO) metric, satisfiesall properties of a metric on the space of protein representations. Having a metric in that space allows one to avoid pairwise comparisons on the entire database and, thus, to significantly accelerate exploring the protein space compared to no-metric spaces. We show on a gold standard superfamily classification benchmark set of 6759 proteins that our exact k-nearest neighbor (k-NN) scheme classifies up to 224 out of 236 queries correctly and on a larger, extended version of the benchmarkwith 60; 850 additional structures, up to 1361 out of 1369 queries. Our k-NN classification thus provides a promising approach for the automatic classification of protein structures based on flexible contact map overlap alignments.
AB - In this work, we propose a new distance measure for comparing two protein structures based on their contact map representations. We show that our novel measure, which we refer to as the maximum contact map overlap (max-CMO) metric, satisfiesall properties of a metric on the space of protein representations. Having a metric in that space allows one to avoid pairwise comparisons on the entire database and, thus, to significantly accelerate exploring the protein space compared to no-metric spaces. We show on a gold standard superfamily classification benchmark set of 6759 proteins that our exact k-nearest neighbor (k-NN) scheme classifies up to 224 out of 236 queries correctly and on a larger, extended version of the benchmarkwith 60; 850 additional structures, up to 1361 out of 1369 queries. Our k-NN classification thus provides a promising approach for the automatic classification of protein structures based on flexible contact map overlap alignments.
KW - K-nearest neighbor classification
KW - Maximum contact map overlap
KW - Protein space metric
KW - SCOP
KW - Superfamily classification
UR - http://www.scopus.com/inward/record.url?scp=84952308582&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84952308582&partnerID=8YFLogxK
U2 - 10.3390/a8040850
DO - 10.3390/a8040850
M3 - Article
AN - SCOPUS:84952308582
SN - 1999-4893
VL - 8
SP - 850
EP - 869
JO - Algorithms
JF - Algorithms
IS - 4
ER -