Automatic classification of protein structure using the maximum contact map overlap metric

Rumen Andonov, Hristo Djidjev, Gunnar W. Klau, Mathilde Le Boudic-Jamin, Inken Wohlers

Research output: Contribution to JournalArticleAcademicpeer-review

Abstract

In this work, we propose a new distance measure for comparing two protein structures based on their contact map representations. We show that our novel measure, which we refer to as the maximum contact map overlap (max-CMO) metric, satisfiesall properties of a metric on the space of protein representations. Having a metric in that space allows one to avoid pairwise comparisons on the entire database and, thus, to significantly accelerate exploring the protein space compared to no-metric spaces. We show on a gold standard superfamily classification benchmark set of 6759 proteins that our exact k-nearest neighbor (k-NN) scheme classifies up to 224 out of 236 queries correctly and on a larger, extended version of the benchmarkwith 60; 850 additional structures, up to 1361 out of 1369 queries. Our k-NN classification thus provides a promising approach for the automatic classification of protein structures based on flexible contact map overlap alignments.

Original languageEnglish
Pages (from-to)850-869
Number of pages20
JournalAlgorithms
Volume8
Issue number4
DOIs
Publication statusPublished - 2015

Fingerprint

Protein Structure
Overlap
Contact
Proteins
Protein
Metric
Nearest Neighbor
Query
Pairwise Comparisons
Distance Measure
Gold
Accelerate
Metric space
Alignment
Classify
Entire
Benchmark

Keywords

  • K-nearest neighbor classification
  • Maximum contact map overlap
  • Protein space metric
  • SCOP
  • Superfamily classification

Cite this

Andonov, R., Djidjev, H., Klau, G. W., Boudic-Jamin, M. L., & Wohlers, I. (2015). Automatic classification of protein structure using the maximum contact map overlap metric. Algorithms, 8(4), 850-869. https://doi.org/10.3390/a8040850
Andonov, Rumen ; Djidjev, Hristo ; Klau, Gunnar W. ; Boudic-Jamin, Mathilde Le ; Wohlers, Inken. / Automatic classification of protein structure using the maximum contact map overlap metric. In: Algorithms. 2015 ; Vol. 8, No. 4. pp. 850-869.
@article{1791ed06b99b4409883692c628736326,
title = "Automatic classification of protein structure using the maximum contact map overlap metric",
abstract = "In this work, we propose a new distance measure for comparing two protein structures based on their contact map representations. We show that our novel measure, which we refer to as the maximum contact map overlap (max-CMO) metric, satisfiesall properties of a metric on the space of protein representations. Having a metric in that space allows one to avoid pairwise comparisons on the entire database and, thus, to significantly accelerate exploring the protein space compared to no-metric spaces. We show on a gold standard superfamily classification benchmark set of 6759 proteins that our exact k-nearest neighbor (k-NN) scheme classifies up to 224 out of 236 queries correctly and on a larger, extended version of the benchmarkwith 60; 850 additional structures, up to 1361 out of 1369 queries. Our k-NN classification thus provides a promising approach for the automatic classification of protein structures based on flexible contact map overlap alignments.",
keywords = "K-nearest neighbor classification, Maximum contact map overlap, Protein space metric, SCOP, Superfamily classification",
author = "Rumen Andonov and Hristo Djidjev and Klau, {Gunnar W.} and Boudic-Jamin, {Mathilde Le} and Inken Wohlers",
year = "2015",
doi = "10.3390/a8040850",
language = "English",
volume = "8",
pages = "850--869",
journal = "Algorithms",
issn = "1999-4893",
publisher = "MDPI AG",
number = "4",

}

Andonov, R, Djidjev, H, Klau, GW, Boudic-Jamin, ML & Wohlers, I 2015, 'Automatic classification of protein structure using the maximum contact map overlap metric' Algorithms, vol. 8, no. 4, pp. 850-869. https://doi.org/10.3390/a8040850

Automatic classification of protein structure using the maximum contact map overlap metric. / Andonov, Rumen; Djidjev, Hristo; Klau, Gunnar W.; Boudic-Jamin, Mathilde Le; Wohlers, Inken.

In: Algorithms, Vol. 8, No. 4, 2015, p. 850-869.

Research output: Contribution to JournalArticleAcademicpeer-review

TY - JOUR

T1 - Automatic classification of protein structure using the maximum contact map overlap metric

AU - Andonov, Rumen

AU - Djidjev, Hristo

AU - Klau, Gunnar W.

AU - Boudic-Jamin, Mathilde Le

AU - Wohlers, Inken

PY - 2015

Y1 - 2015

N2 - In this work, we propose a new distance measure for comparing two protein structures based on their contact map representations. We show that our novel measure, which we refer to as the maximum contact map overlap (max-CMO) metric, satisfiesall properties of a metric on the space of protein representations. Having a metric in that space allows one to avoid pairwise comparisons on the entire database and, thus, to significantly accelerate exploring the protein space compared to no-metric spaces. We show on a gold standard superfamily classification benchmark set of 6759 proteins that our exact k-nearest neighbor (k-NN) scheme classifies up to 224 out of 236 queries correctly and on a larger, extended version of the benchmarkwith 60; 850 additional structures, up to 1361 out of 1369 queries. Our k-NN classification thus provides a promising approach for the automatic classification of protein structures based on flexible contact map overlap alignments.

AB - In this work, we propose a new distance measure for comparing two protein structures based on their contact map representations. We show that our novel measure, which we refer to as the maximum contact map overlap (max-CMO) metric, satisfiesall properties of a metric on the space of protein representations. Having a metric in that space allows one to avoid pairwise comparisons on the entire database and, thus, to significantly accelerate exploring the protein space compared to no-metric spaces. We show on a gold standard superfamily classification benchmark set of 6759 proteins that our exact k-nearest neighbor (k-NN) scheme classifies up to 224 out of 236 queries correctly and on a larger, extended version of the benchmarkwith 60; 850 additional structures, up to 1361 out of 1369 queries. Our k-NN classification thus provides a promising approach for the automatic classification of protein structures based on flexible contact map overlap alignments.

KW - K-nearest neighbor classification

KW - Maximum contact map overlap

KW - Protein space metric

KW - SCOP

KW - Superfamily classification

UR - http://www.scopus.com/inward/record.url?scp=84952308582&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84952308582&partnerID=8YFLogxK

U2 - 10.3390/a8040850

DO - 10.3390/a8040850

M3 - Article

VL - 8

SP - 850

EP - 869

JO - Algorithms

JF - Algorithms

SN - 1999-4893

IS - 4

ER -