Protein Structure Determination by Combining Sparse NMR Data with Evolutionary Couplings
Project Description
Accurate protein structure determination by NMR is challenging for larger (> 15 - 20 kDa) proteins, for which experimental data is often incomplete and ambiguous. However, the massive increase in evolutionary sequence information coupled with maximum likelihood covariance analysis now provides a rich complementary source of structural constraints. Exploiting this synergy, we have developed a hybrid approach that uses evolutionary couplings together with sparse NMR data to determine accurate 3D protein structures. We demonstrate this hybrid "EC-NMR" method by determining accurate structures of eight proteins ranging in size from 64 to 370 residues. ECs can be combined with sparse NMR data obtained on deuterated, selectively protonated protein samples to provide structures that are more accurate and complete than those obtained using such sparse NMR data alone. This advance significantly expands the range of proteins for which accurate structures can be determined using either evolutionary coupling analysis or NMR spectroscopy data alone.
Downloads
Data Sets(8): https://github.rpi.edu/RPIBioinformatics/EC-NMR/tree/master/data_zip
Software:
- EVfold
- ASDP
Tutorial
Sample Data. https://github.rpi.edu/RPIBioinformatics/EC-NMR/blob/master/sample_data.tar.gz
EC pairs are generated from sequence data. EC pairs can be calculated using the EVfold-plm pipeline available at evfold.org. ECs can also be identified using alternative software, including PSICOV1, GREMLIN2,3, or other methods4, although these alternative methods have not been tested here. EC pairs are sorted in ascending order based on the coupling scores, in order to extract the top L EC pairs with highest coupling scores.
Resonance assignment table. The NMR resonance assignment table is prepared in either BMRB 2.x or 3.x format5. The ASDP software does interpret the ambiguity code column, which should be correctly prepared, as these data are needed for denoting stereospecific assignments Leu and Val isopropyl methyl groups and individual assignments of side amide hydrogens.
NOESY peak lists. Peak lists are generated from 2D, 3D, 4D, and/or pseudo4D NOESY data using standard automated peak picking programs, an generally should be manually edited to eliminate obvious noise peaks. These peak lists are prepared in X-Easy format6. For pseudo 4D NOESY data7, the pseudo chemical shifts for the indirect proton dimension should be labeled as 999 in the peak list.
Backbone dihedral angle restraints. Dihedral angle restraints may be generated automatically from backbone chemical shift using TALOS-N8 (or TALOS+9), or defined by alternative automated and/or manual methods. When using the ASDP program, dihedral angle restraints should be prepared in Cyana format. For perdeuterated samples, the talosn command shall use [�iso] to provide appropriate deuterium correction to chemical shifts. The Talos2dyana.com script from the talosn package can be used to generate restraints in Cyana format for EC-NMR calculation
Residual dipolar coupling data. Residual dipolar coupling data should be provided in the table format outlined in Sample Data. The RDC list supports multiple interatomic vectors in multiple media, including N-H, N-CA (intra), and N-C' (sequential) vectors with error and weight factors. The RDC file shall also provide the Da (magnitude) and R (Rhombicity) notation typical of programs such as PALES10 and ReDCat11.
Parameter table for ASDP. When using the ASDP program, the par.tbl parameter table from the Sample Data should be used as the default parameter table.
Control file. For each project, ASDP requires a control file which specifies the protein name, sequences, input files and instructions to the program on how to run structure calculations. An example control-file is provided with the Sample Data. The flag EC= should be included in the control file. The tolerance for the pseudo proton should be set as 999 in the control-file.
Generation of EC NMR structures with ASDP. Access to the ASDP software, together with a short tutorial, is available on line at:
http://www-nmr.cabm.rutgers.edu/NMRsoftware/asdp/Quick_Starts.html
Additional instructions for using ASDP are available at:
http://www.nmr2.buffalo.edu/nesg.wiki/AutoStructure_Structure_Determination_Program
The following ASDP commands are required used to run EC-NMR calculations :
asdp -c control-file # Defines control file name required
-o outputDir # defines output file name
-m # defines that it is a sparse NMR data set (not fully protonated)
-k 5.0 # calibration constant for converting NOE intensities to distances
[-i] # required only for perdeuterated sample,
# instructs program to apply isotope shift corrections to
# Cα and Cβ chemical shift values.
Refinement of EC NMR structures with Rosetta. ASDP can use various programs to generate 3D structures from the NOESY-based distance restraints that the program derives from the NOESY peak list and chemical shift lists. For EC-NMR calculations, the program has been most thoroughly tested using CYANA for structure generation. Each of the resulting NMR structure models are then further energy refined using the restrained Rosetta refinement protocol outlined in Mao et al12. Detailed protocols for Restrained Rosetta refinement are available on line at
http://www.nmr2.buffalo.edu/nesg.wiki/Rosetta_High_Resolution_Protein_Structure_Refinement_Protocol
The script getCC.pl from the ASDP-2.0 package is used to generate specific atom-atom Rosetta refinement constraints for each atom pair in residue pairs of EC list which have interatomic distance ≤ 5 Åin all 20 models. Upper-bound restraints of 7 Åare used for all of these specific atom-atom constraints. The inputs for getCC.pl is the PDB file of the final models (.pdb in the final ASDP cycle) and the final EC pairs (.ec in the final ASDP cycle). The output final.upl in the final cycle is also added into Rosetta ambiguous restraint for Rosetta refinement. The distance upper bounds are loosened by 30% before converting to the Rosetta constraint format. This can be done using a stand-alone version of Rosetta. Alternatively, a Restrained Rosetta Refinement server available on line at:
http://psvs-1_4-dev.nesg.org/consRosetta.html
Assessment of structure reliability The NMR DP scores13 reported by ASDP (/_DP.ovw) provide a global measure of how well the structures fit with the NMR NOE data. Reliable models will generally have DP scores > 0.73. NMR DP scores can also be computed independently of the ASDP program using the RPF-DP server available on line at http://nmr.cabm.rutgers.edu/rpf/. The RPF-DP program can also be downloaded to run on local machines. Reliable EC-NMR structures also have Structure Quality Z-scores14 > -2 for Procheck(backbone), Procheck(all dihedral), Verify3D, MolProbity, and Prosa II knowledge-based structure quality assessment metrics. Structure quality Z scores can be computed using the on-line Protein Structure Validation Software Suite Server (PSVS) accessible at http://psvs-1_5-dev.nesg.org/. Detailed instructions on using the PSVS server are available at http://www.nmr2.buffalo.edu/nesg.wiki/PSVS.
References
-
Jones, D. T., Buchan, D. W., Cozzetto, D. & Pontil, M. PSICOV: precise structural contact prediction using sparse inverse covariance estimation on large multiple sequence alignments. Bioinformatics 28, 184-190, doi:10.1093/bioinformatics/btr638 (2012).
-
Ovchinnikov, S., Kamisetty, H. & Baker, D. Robust and accurate prediction of residue-residue interactions across protein interfaces using evolutionary information. eLife 3, e02030, doi:10.7554/eLife.02030 (2014).
-
Kamisetty, H., Ovchinnikov, S. & Baker, D. Assessing the utility of coevolution-based residue-residue contact predictions in a sequence- and structure-rich era. Proceedings of the National Academy of Sciences of the United States of America 110, 15674-15679, doi:10.1073/pnas.1314045110 (2013).
-
de Juan, D., Pazos, F. & Valencia, A. Emerging methods in protein co-evolution. Nat Rev Genet 14, 249-261, doi:10.1038/nrg3414 (2013).
-
Ulrich, E. L. et al. BioMagResBank. Nucleic Acids Res 36, D402-408, doi:10.1093/nar/gkm957 (2008).
-
Bartels, C., Xia, T. H., Billeter, M., Guntert, P. & Wuthrich, K. The program XEASY for computer-supported NMR spectral analysis of biological macromolecules. J Biomol NMR 6, 1-10, doi:10.1007/BF00417486 (1995).
-
Diercks, T., Coles, M. & Kessler, H. An efficient strategy for assignment of cross-peaks in 3D heteronuclear NOESY experiments. J Biomol NMR 15, 177-180, doi:10.1023/A:1008367912535 (1999).
-
Shen, Y. & Bax, A. Protein backbone and sidechain torsion angles predicted from NMR chemical shifts using artificial neural networks. J Biomol NMR 56, 227-241, doi:10.1007/s10858-013-9741-y (2013).
-
Shen, Y., Delaglio, F., Cornilescu, G. & Bax, A. TALOS+: a hybrid method for predicting protein backbone torsion angles from NMR chemical shifts. Journal of biomolecular NMR 44, 213-223, doi:10.1007/s10858-009-9333-z (2009).
-
Zweckstetter, M. Prediction of Sterically Induced Alignment in a Dilute Liquid Crystalline Phase: Aid to Protein Structure Determination by NMR. J Am Chem Soc 122, 3791-3792 (2000).
-
Valafar, H. & Prestegard, J. H. REDCAT: a residual dipolar coupling analysis tool. Journal of magnetic resonance 167, 228-241, doi:10.1016/j.jmr.2003.12.012 (2004).
-
Mao, B., Tejero, R., Baker, D. & Montelione, G. T. Protein NMR structures refined with Rosetta have higher accuracy relative to corresponding X-ray crystal structures. Journal of the American Chemical Society 136, 1893-1906, doi:10.1021/ja409845w (2014).
-
Huang, Y. J., Powers, R. & Montelione, G. T. Protein NMR recall, precision, and F-measure scores (RPF scores): structure quality assessment measures based on information retrieval statistics. Journal of the American Chemical Society 127, 1665-1674, doi:10.1021/ja047109h (2005).
-
Bhattacharya, A., Tejero, R. & Montelione, G. T. Evaluating protein structures determined by structural genomics consortia. Proteins 66, 778-795, doi:10.1002/prot.21165 (2007).