FilteringAF2_scripts
Set of scripts to trim, filter, minimize AF2 models
Introduction
In general, AF2 calculations produce a final “relaxed” set of models/structures (either only 1-5 or as big as 6000 or more, like AF2-sample, AF2-alt, …). The set of scripts described here will work on the unrelaxed and/or relaxed set of those models (normal AF2 relax output process) and, most important, in the output of the AF2 calculation called slurm-xxxx.out if one uses the Slurm queueing system. The filtering process relies on collecting information from that output. I heartedly advise everyone to keep and always store these outputs as they can offer a great deal of information in later procedures/analysis.
Some definitions
-
AF2 relax: the models out of AF2 have only heavy atoms so, the models are “AMBER relaxed” in the last step, this involves:
- Look for missing residues and add them if missing (pdbfixer)
- Look for missing heavy atoms (generally the OXT) and add them if missing (pdbfixer)
- Add hydrogen atoms to the model according to selected pH (pdbfixer)
- Perform a “short”, restricted, minimization (relax) carried out by OpenMM using AMBER Force Field.
- 100 steps, 2.39 tolerance, AMBER99SB force field
- Heavy atoms are “restrained” with a K of 10.0 kcal/mol (they can move a little, they are NOT constrained)
-
Filtering of models: in the “AF2 relax” procedure described above, AF2 offers some useful information as initial and final energies, missing atoms, and residues either “violated” and/or “excluded”. The violations/exclusions mean that something was geometrically not feasible or hard clashes between atoms were present in the relaxing protocol. If this is the case, one can want to filter out those cases from the bunch of models keeping only the good energetics ones and the no violations/exclusions. There is a more complete script called FilterAF2.py that can do both filtering, and trimming, at once on the full set.
-
Trimming of models: if the AF2 calculation included a sequence that resulted in long stretches of “not-well-defined” or “not structured” tails or loops (usually they have also low pLDDT) one may want to trim those stretches for further studies. While this can look straightforward for a few models it can be cumbersome for 6000 models so, there is script called TrimmModels.py that can do that trimming in an automatic way for the whole selected set. Description of the scripts In all cases calling the scripts with no arguments (no flags) will produce some description and usage messages to help users in using them. On each case first the short help/usage is show and then some comments gained from experimenting with them.
TrimmModels.py
** TrimmModels.py (RTT, 2024)
Python script to output AF2 coords trimmed using a well-defined range or pLDDT cutoff
Well-defined OR pLDDT options are mutually exclusive, only one can be used
(1) outDir will hold:
- the trimmed models from inDir
(2) Current dir, from where the script was called, will hold:
- (a) multimodel file with trimmed models grouped together (ex. Trimmed_plDDT_45.pdb)
- (b) text file with a list of trimmed models (ex. ListToMini) can be used to minimize
USAGE: TrimmModels.py <ARGUMENTS>
ARGUMENTS:
-help ( displays this text help message )
-inD <string> ( directory path to take input coords from )
-outD <string> ( directory path to put the trimmed output )
-w[ell] <string> ( well defined to trim, ex: A20..A98,B2..B52 )
-pl[DDT] <int> ( plDDT cutoff to use in triming the models )
-file <string> ( files to trim, ex: unrelax )
-deb <int> ( level of debug info, from 1-10 )
Examples:
TrimmModels.py -inD Unrelaxed -outD Trimmed -pl 55.0 -fi unrelax
TrimmModels.py -well A20..A90,B2..B54 -inD Unrelaxed -outD Trimmed -fi unrelax
This script will trim models based on well-defined range of residues (-well flag) or on pLDDT values (-pl flag).
NOTES:
- Be aware that trimming on pLDDT can produce models with only a few residues in some cases with very low pLDDT values from the AF2 calculation. The number of trimmed residues on a bunch of models will not be the same as some residues have nice pLDDT values in one of them and not in the others. I have seen cases, I believe is AF2_alt, where the maximum value of pLDDTs in various models was as low as 45. This produced a trimmed model with only 10-20 residues out of the 340 initial ones.
- A safer way to proceed is to trim based on well-defined ranges as this method always produce the same number of residues for all models.
- Flag -file can be avoided and then every file in the -inD directory will be processed. If used can select files that contain the text after the -file in their names.
Installation needs:
- I believe it will run nicely out the box with only Python installed on the machine
- In case someone has no Python locally I have another version in Perl that works and produces same results
FilterAF2.py
** FilterAF2.py (RTT, 2024)
Python script to extract/filter energy data from AF2 output and paste the energy along with
the name of the model on REMARK records for future reference in checking or cleaning
It creates outDir if requested. If not will dump results on current directory so, I
recommend to use always an outDir for holding the files (AF2_sample usually 6000)
(1) - If using slurm output (i.e from queue system) the final files are appended with af2
- If using local output (i.e. local relax) the final files are appended with rtt
- If not given -relax nor -unrelax, it will run automatically for both cases
(2) outDir will hold:
- the unrelax or relax models from inDir
(3) Current dir will hold:
- (a) multimodel file with all models grouped together (ex. Good_relax_af2.pdb)
- (b) multimodel file with all *filtered* out models (ex. Filtered_out_relax_rtt.pdb)
- (c) text file with a list of filtered out models (ex. Filteredout_unrelax_rtt.txt)
- (d) text file with a list of "good" models names (ex. List_good_urelax_rtt.txt)
(4) If not -relax nor -unrelax given, both sets (relaxed and unrelaxed) worked in turn
USAGE: FilterAF2.py -log logFile -inD SomeDir -outD SomeOtherDir -relax|-unrelax
ARGUMENTS:
-help ( displays this text help message )
-log <string> ( log_file to use either from slurm (af2) or local (rtt) )
-inD <string> ( directory path to take input coords from )
-outD <string> ( directory path to put the filtered output )
-well <string> ( well defined to trim, ex: A20..A98,B2..B52 )
-Ei <float> ( cutoff for initial E filtering [def 1.0e30] )
-Ef <float> ( cutoff for final E filtering [def 1.0e30] )
-rel ( pick only relaxed models to work wwith )
-unre ( pick only unrelaxed models to work with )
-deb <int> ( level of debug info, from 1-10 )
Examples:
FilterAF2.py -log Log -inD ../StoredModels -outD Filtered -relax
FilterAF2.py -log Log -well A20..A90,B2..B54 -inD Inpurdir -outD Results -unrelax
A more specialized script that can filter calculated models, either from relaxed ones or unrelaxed ones, based on the slurm output or the local output is the relax process was done in a local machine.
This script can also trim the files on the fly (-well flag) after being selected with Energy or residue violations/exclusions. If no Ei and Ef are supplied (default value is 1e30) it assumes the filtering will be done only using information about residues violated/excluded.
Running needs:
- An output file to produce the filtering, either from slurm or the output obtained running locally the minimization from pkl files.
Installation needs:
- I believe it will run nicely out the box with only Python installed on the machine, no special packages needed
OpenMM_mini.py
** OpenMmini.py (RTT, March 2024. V1.0)
A python script to perform an energy minimization on a PDB file
Optional Args: options between (), defaults between brackets on the right
Special flags
-af2 defaults to emulate AF2 relax protocol
-af2w same as af2 plus implicit water in the protocol
-rtt defaults to RTT relax protocol
-inD <string> directory to get the PDB files from
-outD <string> directory to write final mimized PDB files
-list <string> read from List the PDB files to minimize
-st[eps] <int> the number of steps for minimization, [0]
-tol[erance] <float> tolerance in forces for minimization, [10.0]
-Ecut <float> Epot cutoff to report a list of good E, [0.0]
-ff[ield] <string> Force field to use (amber14, amber99) [AMBER14]
-wa[ter] <string> Water model (implicit, amber14, amber99) [no default]
-box <float> Size of cubic box of explicit water (nm), [1.43]
-res[train] <string> restrain atoms around original positions, [no default]
-con[strain] <string> Constrain atoms, no move at all [no default]
(backbone, heavy, N,CA,C,O )
-kf <float> value of stiffness to restrain/constrain [10.0]
-NBm[ethod] <string> nonbonded method to use. [CutoffNonPeriodic]
(NoCutoff, CutoffNonPeriodic, CutoffPeriodic)
-NBc[ut] <float> nonbonded cut-off [20 A]
-gr[oup] <string> File name for the multimodel file
Defaults for flags: (note than can be changed at command line)
-af2 -af2w -rtt no flag
-------- -------- ------------ ------------
Method AF2 AF2W RTT Custom
Steps 100 100 0 (unlimited) 0 (unlimited)
tolerance 2.39 2.39 1.0 1.0
restrained heavy heavy backbone --
stiffness 10.0 10.0 5.0 5.0
NBmethod NoCutoff. NoCutoff NoCutoff NoCutoff
ForceField amber99sb amber99sb amber99sb amber99sb
FFwater -- implicit -- --
Use examples:
OpenMM_mini.py -af2 -inD Unrelaxed -list ListToMini -outD Minimized -gr AllPDBS.pdb
OpenMM_mini.py -restr heavy -wat implicit ../SomeDir/*.pdb
OpenMM_mini.py -steps 1000 -tolerance 2.39 -list Lista -inD ../SomeDir/
Installation needs: the following is needed to run OpenMM_minimize.py
- Python > 3.9
- OpenMM 8.1.0
- Pdbfixer 1.9
- Numpy
- Math
Relax_from_pkl.py. (Not included right now)
Script to relax a bunch of models from their pkl using the features.pkl . This is a little adaptation of the regular Relax_from_pkl.py that I was using locally on my machine after a minimum local installation of AF2.