MyJournals Home  

RSS FeedsProtein contact prediction by integrating deep multiple sequence alignments, coevolution and machine learning (Proteins: Structure, Function, and Bioinformatics)

 
 

19 october 2017 09:05:23

 
Protein contact prediction by integrating deep multiple sequence alignments, coevolution and machine learning (Proteins: Structure, Function, and Bioinformatics)
 


In this work, we report the evaluation of the residue-residue contacts predicted by our three different methods in the CASP12 experiment, focusing on studying the impact of multiple sequence alignment, residue coevolution and machine learning on contact prediction. The first method (MULTICOM-NOVEL) uses only traditional features (sequence profile, secondary structure and solvent accessibility) with deep learning to predict contacts and serves as a baseline. The second method (MULTICOM-CONSTRUCT) uses our new alignment algorithm to generate deep multiple sequence alignment to derive coevolution-based features, which are integrated by a neural network method to predict contacts. The third method (MULTICOM-CLUSTER) is a consensus combination of the predictions of the first two methods. We evaluated our methods on 94 CASP12 domains. On a subset of 38 free-modeling domains, our methods achieved an average precision of up to 41.7% for top L/5 long-range contact predictions. The comparison of the three methods shows that the quality and effective depth of multiple sequence alignments, coevolution-based features, and machine learning integration of coevolution-based features and traditional features drive the quality of predicted protein contacts. On the full CASP12 dataset, the coevolution-based features alone can improve the average precision from 28.4% to 41.6%, and the machine learning integration of all the features further raises the precision to 56.3%, when top L/5 predicted long-range contacts are evaluated. And the correlation between the precision of contact prediction and the logarithm of the number of effective sequences in alignments is 0.66. This article is protected by copyright. All rights reserved.


 
239 viewsCategory: Biochemistry, Bioinformatics
 
Definition and classification of evaluation units for tertiary structure prediction in CASP12 facilitated through semi-automated metrics (Proteins: Structure, Function, and Bioinformatics)
A tribute to Anna Tramontano (1957 - 2017) (Proteins: Structure, Function, and Bioinformatics)
 
 
blog comments powered by Disqus


MyJournals.org
The latest issues of all your favorite science journals on one page

Username:
Password:

Register | Retrieve

Search:

Bioinformatics


Copyright © 2008 - 2024 Indigonet Services B.V.. Contact: Tim Hulsen. Read here our privacy notice.
Other websites of Indigonet Services B.V.: Nieuws Vacatures News Tweets Nachrichten