Permalink
Cannot retrieve contributors at this time
Name already in use
A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
AlphaFold-NMR/README.md
Go to fileThis commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
66 lines (40 sloc)
2.66 KB
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# AlphaFold-NMR | |
Scripts and data corresponding to Huang, Ramelot, Spaman, Kobayashi, Montelione (2025) "Hidden Structural States of Proteins Revealed by Conformer Selection with AlphaFold-NMR" | |
<img width="1161" alt="flowChart" src="https://media.github.rpi.edu/user/352/files/b6534f5c-e03d-4529-a4a5-bbc697c9db04"> | |
# Usage | |
Data and scripts for CDK2AP1-doc1 AF-NMR analysis: | |
1. AI enhanced sampling: CDK2AP1-doc1/EnhancedSampling/ | |
* doc1_noN.fasta: input fasta sequence. We exclude the long disordered tails and non-native tags from the input fasta sequence for AF modeling, to avoid potential influence on the pTM and \<pLDDT\> scores. | |
Commands: | |
> sbatch run_doc1_noN.sh (running with slrum) | |
This command calculates and relax all 6000 models using run_afsample6000.sh <br> | |
The output models are here: AF_models_dropout/doc1_noN. pTM score is reported here: AF_models_dropout/scores.sc and log from AF: slurm-xxx.out | |
> python FilterAF2.py -log slurm-xxx.out -rel -inD AF_models_dropout/doc1_noN -outD filteredModels | |
This command filters out bad models based on the AF log file (e.g. slurm-xxx.out). The python code is copied from here: https://github.rpi.edu/RPIBioinformatics/FilteringAF2_scripts | |
Additional processing scripts: | |
Merge two chains into one chain for clustering analysis | |
> sh runMergedChain.py filteredModels mergedModels | |
This command finds all pdb file in the fileredModels, merge two chains (using mergeChain.py) and save them in the mergedModels directory. | |
5984 models with one merged chain (CDK2AP1-doc1/ESmodels/) are used for the following analyses: | |
2. Clustering | |
R scripts for CDK2AP1: | |
CDK2AP1-doc1/Clustering/: | |
dmPCAClustering.R --> output: pc_dm_pdbs.RData, cluster_pc_dm.csv (in Rstudio, set the working dir to this dir before running the R script) | |
We found that "ward methods" gives largest agglomerative coefficient. Number of clusters --> by viusal inspection of "Dendrogram" and pc plots to identify number of well-seperated clusters. | |
3. Scoring | |
NMRdata: | |
* CDK2AP1-doc1/RPF/: directory to calculate RPF scores: | |
- Input/ | |
- | |
* CDK2AP1-doc1/RCI/: directory to calculate RCI and SCC scores | |
- RPFtable2SHIFTY.py: convert the chemical shift file used by RPF (bmrbtable file) to SHIFTY format | |
- nmrstar3toSHIFTY-fromBMRB.py: give the bmrb ID number, download chemical shift assignments from the BMRB database and convert to SHIFTY format | |
- nmrstar3toSHIFTY.py: convert the local bmrb file in nmrstart 3.0 format to SHIFTY format | |
Other scores: | |
* CDK2AP1-doc1/Scoring/: | |
- combine all scores: | |
- plots: | |
4. State combination | |
scripts for CDK2AP1: | |
5. Validation | |
scripts for CDK2AP1: |