Multi-class superfamily prediction using 3D models enriched with physicochemical properties
Main Article Content
In this paper, two new methods that address the multi-class superfamily prediction problem are presented. In the multi-class superfamily recognition problem each amino acid sequence has to be classified into one of the known structural classes (i.e., superfamilies). Most of the strategies that have been proposed to predict superfamilies are based on using the binary classifiers that detect remote homologs. The remote homology detection problem is about finding a classifier that is able to separate remote homologs from non-remote homologs. The current methods for multi-class superfamily recognition take the outputs of the binary classifier (i.e., the scores) for each SCOP superfamily in the data set and build a classification model (i.e., multi-class classifier). Unlike the current methods, which represent a protein considering the amino acids composition, in this research we use the number of times that 3D models enriched with physicochemical properties occur in both its predicted contact map and its interaction matrix. We hypothesize that including both 3D information and physicochemical properties might have an impact in the accuracy obtained during the superfamily prediction. In this paper, we present two new strategies for predicting superfamilies that use 3D models enriched with physicochemical properties, the single-MCS and the hierarchical- MCS methods, which reach an accuracy percentage of 74% and 76% on the SCOP 1.53 data set, respectively. In
addition, tests on the SCOP 1.55 and the SCOP 1.61 are also presented
- Superfamily prediction
- Physicochemical properties
- Binary classifiers
- SCOP superfamily
- 3D enriched models.
Downloads
Authors grant the journal and Universidad del Valle the economic rights over accepted manuscripts, but may make any reuse they deem appropriate for professional, educational, academic or scientific reasons, in accordance with the terms of the license granted by the journal to all its articles.
Articles will be published under the Creative Commons 4.0 BY-NC-SA licence (Attribution-NonCommercial-ShareAlike).