Skip to content

RPIBioinformatics/FilteringAF2_scripts

main
Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
Code

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
March 2, 2024 17:13
March 2, 2024 20:03
March 2, 2024 17:47
March 2, 2024 20:01

FilteringAF2_scripts

Set of scripts to trim, filter, minimize AF2 models

Introduction

In general, AF2 calculations produce a final “relaxed” set of models/structures (either only 1-5 or as big as 6000 or more, like AF2-sample, AF2-alt, …). The set of scripts described here will work on the unrelaxed and/or relaxed set of those models (normal AF2 relax output process) and, most important, in the output of the AF2 calculation called slurm-xxxx.out if one uses the Slurm queueing system. The filtering process relies on collecting information from that output. I heartedly advise everyone to keep and always store these outputs as they can offer a great deal of information in later procedures/analysis.

Some definitions

  • AF2 relax: the models out of AF2 have only heavy atoms so, the models are “AMBER relaxed” in the last step, this involves:

    • Look for missing residues and add them if missing (pdbfixer)
    • Look for missing heavy atoms (generally the OXT) and add them if missing (pdbfixer)
    • Add hydrogen atoms to the model according to selected pH (pdbfixer)
    • Perform a “short”, restricted, minimization (relax) carried out by OpenMM using AMBER Force Field.
      • 100 steps, 2.39 tolerance, AMBER99SB force field
      • Heavy atoms are “restrained” with a K of 10.0 kcal/mol (they can move a little, they are NOT constrained)
  • Filtering of models: in the “AF2 relax” procedure described above, AF2 offers some useful information as initial and final energies, missing atoms, and residues either “violated” and/or “excluded”. The violations/exclusions mean that something was geometrically not feasible or hard clashes between atoms were present in the relaxing protocol. If this is the case, one can want to filter out those cases from the bunch of models keeping only the good energetics ones and the no violations/exclusions. There is a more complete script called FilterAF2.py that can do both filtering, and trimming, at once on the full set.

  • Trimming of models: if the AF2 calculation included a sequence that resulted in long stretches of “not-well-defined” or “not structured” tails or loops (usually they have also low pLDDT) one may want to trim those stretches for further studies. While this can look straightforward for a few models it can be cumbersome for 6000 models so, there is script called TrimmModels.py that can do that trimming in an automatic way for the whole selected set. Description of the scripts In all cases calling the scripts with no arguments (no flags) will produce some description and usage messages to help users in using them. On each case first the short help/usage is show and then some comments gained from experimenting with them.

TrimmModels.py

 ** TrimmModels.py (RTT, 2024) 
 Python script to output AF2 coords trimmed using a well-defined range or pLDDT cutoff
 Well-defined OR pLDDT options are mutually exclusive, only one can be used 

  (1) outDir will hold: 
      - the trimmed models from inDir 
  (2) Current dir, from where the script was called, will hold: 
      - (a) multimodel file with trimmed models grouped together (ex. Trimmed_plDDT_45.pdb) 
      - (b) text file with a list of trimmed models (ex. ListToMini) can be used to minimize 

 USAGE:	 TrimmModels.py <ARGUMENTS>   

 ARGUMENTS:
	 -help             	 ( displays this text help message            )        
	 -inD     <string> 	 ( directory path to take input coords from   )        
	 -outD    <string> 	 ( directory path to put the trimmed  output  )        
	 -w[ell]  <string> 	 ( well defined to trim, ex: A20..A98,B2..B52 )        
	 -pl[DDT] <int>    	 ( plDDT cutoff to use in triming the models  )        
	 -file    <string> 	 ( files to trim, ex: unrelax                 )        
	 -deb     <int>    	 ( level of debug info, from 1-10             )      

 Examples:
	 TrimmModels.py -inD Unrelaxed -outD Trimmed -pl 55.0 -fi unrelax 
	 TrimmModels.py -well A20..A90,B2..B54 -inD Unrelaxed -outD Trimmed -fi unrelax 

This script will trim models based on well-defined range of residues (-well flag) or on pLDDT values (-pl flag).
NOTES:

  • Be aware that trimming on pLDDT can produce models with only a few residues in some cases with very low pLDDT values from the AF2 calculation. The number of trimmed residues on a bunch of models will not be the same as some residues have nice pLDDT values in one of them and not in the others. I have seen cases, I believe is AF2_alt, where the maximum value of pLDDTs in various models was as low as 45. This produced a trimmed model with only 10-20 residues out of the 340 initial ones.
  • A safer way to proceed is to trim based on well-defined ranges as this method always produce the same number of residues for all models.
  • Flag -file can be avoided and then every file in the -inD directory will be processed. If used can select files that contain the text after the -file in their names.

Installation needs:

  • I believe it will run nicely out the box with only Python installed on the machine
  • In case someone has no Python locally I have another version in Perl that works and produces same results

FilterAF2.py

  ** FilterAF2.py (RTT, 2024) 

  Python script to extract/filter energy data from AF2 output and paste the energy along with 
       the name of the model on REMARK records for future reference in checking or cleaning 
       It creates outDir if requested. If not will dump results on current directory so, I  
       recommend to use always an outDir for holding the files (AF2_sample usually 6000)    


  (1) - If using slurm output (i.e from queue system) the final files are appended with af2 
      - If using local output (i.e. local relax) the final files are appended with rtt      
      - If not given -relax nor -unrelax, it will run automatically for both cases          
  (2) outDir will hold: 
      - the unrelax or relax models from inDir 
  (3) Current dir will hold: 
      - (a) multimodel file with all models grouped together (ex. Good_relax_af2.pdb)       
      - (b) multimodel file with all *filtered* out models (ex. Filtered_out_relax_rtt.pdb) 
      - (c) text file with a list of filtered out models (ex. Filteredout_unrelax_rtt.txt)  
      - (d) text file with a list of "good" models names  (ex. List_good_urelax_rtt.txt)  
  (4) If not -relax nor -unrelax given, both sets (relaxed and unrelaxed) worked in turn  

 USAGE:	 FilterAF2.py -log logFile -inD SomeDir -outD SomeOtherDir -relax|-unrelax  

 ARGUMENTS: 
	 -help          	 ( displays this text help message                        ) 
	 -log  <string> 	 ( log_file to use either from slurm (af2) or local (rtt) ) 
	 -inD  <string> 	 ( directory path to take input coords from    )             
	 -outD <string> 	 ( directory path to put the filtered output   )             
	 -well <string> 	 ( well defined to trim, ex: A20..A98,B2..B52  )             
	 -Ei   <float>  	 ( cutoff for initial E filtering [def 1.0e30] )             
	 -Ef   <float>  	 ( cutoff for final E filtering   [def 1.0e30] )             
	 -rel           	 ( pick only relaxed models to work wwith      )             
	 -unre          	 ( pick only unrelaxed models to work with     )             
	 -deb  <int>    	 ( level of debug info, from 1-10              )           

 Examples: 
	 FilterAF2.py -log Log -inD ../StoredModels -outD Filtered -relax 
	 FilterAF2.py -log Log -well A20..A90,B2..B54 -inD Inpurdir -outD Results -unrelax 

A more specialized script that can filter calculated models, either from relaxed ones or unrelaxed ones, based on the slurm output or the local output is the relax process was done in a local machine.
This script can also trim the files on the fly (-well flag) after being selected with Energy or residue violations/exclusions. If no Ei and Ef are supplied (default value is 1e30) it assumes the filtering will be done only using information about residues violated/excluded.

Running needs:

  • An output file to produce the filtering, either from slurm or the output obtained running locally the minimization from pkl files.

Installation needs:

  • I believe it will run nicely out the box with only Python installed on the machine, no special packages needed

OpenMM_mini.py

** OpenMmini.py (RTT, March 2024. V1.0) 

 A python script to perform an energy minimization on a PDB file

Optional Args: options between (), defaults between brackets on the right
  	Special flags                                                           
  		-af2                   defaults to emulate AF2 relax protocol           
  		-af2w                  same as af2 plus implicit water in the protocol  
  		-rtt                   defaults to RTT relax protocol                   
                                                                          
  		-inD <string>          directory to get the PDB files from              
  		-outD <string>         directory to write final mimized PDB files       
  		-list <string>         read from List the PDB files to minimize         
  		-st[eps] <int>         the number of steps for minimization,     [0]    
  		-tol[erance] <float>   tolerance in forces for minimization,     [10.0] 
  		-Ecut <float>          Epot cutoff to report a list of good E,   [0.0]  
  		-ff[ield] <string>     Force field to use (amber14, amber99)     [AMBER14] 
  		-wa[ter] <string>      Water model (implicit, amber14, amber99)  [no default]  
  		-box <float>           Size of cubic box of explicit water (nm), [1.43]        
  		-res[train] <string>   restrain atoms around original positions, [no default]  
  		-con[strain] <string>  Constrain atoms, no move at all           [no default]  
          				(backbone, heavy, N,CA,C,O  ) 
  		-kf  <float>           value of stiffness to restrain/constrain  [10.0]        
  		-NBm[ethod] <string>   nonbonded method to use.           [CutoffNonPeriodic]  
                         		(NoCutoff, CutoffNonPeriodic, CutoffPeriodic) 
  		-NBc[ut] <float>       nonbonded cut-off                         [20 A]
  		-gr[oup] <string>      File name for the multimodel file 

	Defaults for flags: (note than can be changed at command line)           
			-af2	    -af2w	 -rtt		no flag       
			--------    --------  ------------   ------------   
  	Method		AF2	     AF2W	RTT		Custom        
  	Steps		100	     100	0 (unlimited)	0 (unlimited) 
  	tolerance	2.39	     2.39	1.0 		1.0           
  	restrained	heavy	     heavy	backbone	 --          
  	stiffness	10.0	     10.0	5.0     	5.0           
  	NBmethod   	NoCutoff.    NoCutoff	NoCutoff        NoCutoff      
  	ForceField    	amber99sb    amber99sb	amber99sb	amber99sb     
  	FFwater             --	     implicit       --		  --          

Use examples:
  	OpenMM_mini.py -af2 -inD Unrelaxed -list ListToMini -outD Minimized -gr AllPDBS.pdb 
  	OpenMM_mini.py -restr heavy -wat implicit ../SomeDir/*.pdb
  	OpenMM_mini.py -steps 1000 -tolerance 2.39 -list Lista -inD ../SomeDir/

Installation needs: the following is needed to run OpenMM_minimize.py

  • Python > 3.9
  • OpenMM 8.1.0
  • Pdbfixer 1.9
  • Numpy
  • Math

Relax_from_pkl.py. (Not included right now)

Script to relax a bunch of models from their pkl using the features.pkl . This is a little adaptation of the regular Relax_from_pkl.py that I was using locally on my machine after a minimum local installation of AF2.

About

Set of scripts to trim, filter, minimize AF2 models

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages