Skip to content

RPIBioinformatics/AlphaFold-NMR

main
Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
Code

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
March 14, 2025 10:12
March 14, 2025 12:07
February 1, 2025 20:49
March 14, 2025 12:15

AlphaFold-NMR Protocol (under construction)

Scripts and data corresponding to Huang, Ramelot, Spaman, Kobayashi, Montelione (2025) "Hidden Structural States of Proteins Revealed by Conformer Selection with AlphaFold-NMR"

flowChart

No required non-standard hardware

Software Requirements

R Packages:

bio3d: http://thegrantlab.org/bio3d/
cluster 

Python Dependences

numpy
pandas

RCI webserver for RCI and SCC calculation

https://www.randomcoilindex.ca/cgi-bin/rci_cgi_current.py

ASDP/RPF for Recall calculation in batch mode

https://github.rpi.edu/RPIBioinformatics/ASDP_public

RPF webserver for DoubleRecall analysis

https://montelionelab.chem.rpi.edu/rpf/

AFsample for enhanced sampling

https://github.com/bjornwallner/alphafoldv2.2.0

AlphaFold-NMR R and Python Scripts with Demo

Typical install times are several minutes.

*** all paths in the scripts need to be changed to your local computer environment

1. AI Enhanced sampling using AFsample

  • run_afsample6000.sh
    • need to modify the path to fit your local computer system
    • calculate 6000 models and also relax all 6000 models.

Runtime: it can take day(s) to calculate 6000 models, depending on the size of the sequence and number of GPUs

  Input sequence for AlphaFold: doc1_noN.fasta. 
  We excluded the long disordered tails and non-native tags from the input fasta sequence for AlphaFold modeling, to avoid potential influence on the pTM and <pLDDT> scores. 
     
  Commands: 
  
  #cd CDK2AP1-doc1 (set working dir: CDK2AP1-doc1)
  
  sbatch run_doc1_noN.sh (running with slrum) 
  This command calculates and relax all 6000 models using run_afsample6000.sh <br>
  The output models are here: AF_models_dropout/doc1_noN. pTM score is reported here: AF_models_dropout/scores.sc and log from AF: slurm-xxx.out 
  
  # python FilterAF2.py -log slurm-xxx.out -rel -inD AF_models_dropout/doc1_noN -outD filteredModels
  This command filters out bad models based on the AF log file (e.g. slurm-xxx.out). 

  Additional processing scripts: 
  
  Merge two chains into one chain for clustering analysis
  # python ../scripts/runMergedChain.py filteredModels mergedModels
  
  This command finds all pdb file in the fileredModels, merge two chains (using mergeChain.py) and save them in the mergedModels directory. 

Download pre-filtered AFsample models

https://zenodo.org/records/15015917 has 5984 models with one merged chain. Please unzip it and name it as CDK2AP1-doc1/ESmodels/ for the following analysis:

2. Clustering

  • dmPCAClustering.R --> output: pc_dm_pdbs.RData, cluster_pc_dm.csv (in Rstudio, set the working dir to CDK2AP1-doc1 before running the R script)

We use "ward methods". To identify number of clusters --> by inspection of "Dendrogram" and pc plots.

Runtime: it can take hours (s) to clustering 6000 models, depending on the size of the sequence and number of GPUs

3. Scoring

Input files:

  • ESmodels: all pdb files, support filenames with "relaxed*.pdb" or "relaxed*_new.pdb"
  • NMRdata: input files to run RPF
  • RCI1.csv: from RCI webserver, the sequence was edited to match with merged sequence from ESmodels

Scripts:

  • runSCC.py: calulate SCC scores for all models, and write to file scc.sc
   # python ../scripts/runSCC.py RCI1.csv ESmodels > scc.sc
  • runRPF.py and getRPF.py: calculate RPF scores for all models, and write to file rpf.sc. slow step - performance can be improved by only output recall, precision, f-measure and dp scores and skips others.
   working directory: NMRdata
   # python ../../scripts/runRPF.py control_RPF ../ESmodels rpfESmodels 
   need to set the RPFcommand in the runRPF.py script 
   # python ../../scripts/getRPF.py rpfESmodels > rpf.sc  

Runtime: mintues to hours for 6000 models, depends on the size of the protein sequence
output: NMRdata/rpfESmodels and NMRdata/rpf.sc

  • getpLDDT.py: calculate scores for all models, and write to file pLDDT.sc
   # python ../scripts/getpLDDT.py ESmodels > pLDDT.sc 
  • getScores.py: combine all scores.
    This script only works for models with the name "relaxed****.pdb" from AFsample. If your model name is different, you will need to change the script.
   # python ../scripts/getScores.py scores.sc pLDDT.sc scc.sc rpf.sc cluster_pc_dm.csv > scores.all  
   # python ../scripts/getScores.py > scores.all (as default, this command will also readin the above files)

4. State combination

  • selectModles.py: select models based on p(model|NMR) scores, create a diretory with top5 models from each cluster
 # sh ../scripts/selectModels.py scores.all ESmodels/ selectedModels/ > selectedModels.txt 
 scores.all: output from step 3 - Scoring
   
 plotScores.R: R plots of these scores used for top 5 model selection from each cluster

output: selectedModels.txt and selectedModels/

  • State combination is based on the SEM plot of RMSFvsRCI and CCC scores.
example R code: CDK2AP1-doc1/RMSF_RCIplot.R 

5. Validation by DoubleRecall analysis

For selected states, get RPF.zip file by runing RPF webserver (https://www.randomcoilindex.ca/cgi-bin/rci_cgi_current.py)

Other codes that may be useful:

  • RCItools -- tools we developed to generate SHIFTY input file to run RCI webserver
    • RPFtable2SHIFTY.py: convert the chemical shift file used by RPF (bmrbtable file) to SHIFTY format
    • nmrstar3toSHIFTY-fromBMRB.py: give the bmrb ID number, download chemical shift assignments from the BMRB database and convert to SHIFTY format
    • nmrstar3toSHIFTY.py: convert the local bmrb file in nmrstart 3.0 format to SHIFTY format
  • group all pdb files in one directory into one pdb file R plots
  • pLDDT_RCIplot.R

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages