Permalink
Cannot retrieve contributors at this time
Name already in use
A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
PMAR/README.md
Go to fileThis commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
75 lines (59 sloc)
4.48 KB
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# How to run AlphaFold and RoseTTAFold on CCI | |
Get an account with PMAR project, set up the login https://github.rpi.edu/RPIBioinformatics/PMAR/blob/main/CCIForms.md | |
1. ssh PMARxxxx@blp02.ccni.rpi.edu, with google authenticator | |
2. ssh nplfen01 to access NPL GPUs. (check https://docs.cci.rpi.edu/status/ for all cluster utilization) | |
3. create or upload your fasta file to nplfen01 (eg. T1088.fasta) | |
4. If you have modified .bashrc for our first version of alphafold setup, you will need to remove the alphafold setup from .bashrc (example included): | |
``` | |
* Edit .bashrc, everything below "# User specific aliases and functions" for alphafold can be deleted. | |
* Log out and back in, now you will have a clean environment. | |
I am also assuming everything was added below that comment, as that is the default last line of the .bashrc file. | |
``` | |
example: https://github.rpi.edu/RPIBioinformatics/PMAR/blob/main/.bashrc | |
5. create two files: the fasta file (eg. T1088.fasta) and also the sbatch script (eg. run_T1088_af2.sh for alphfold, run_T1088_rf.sh for rosettafold) | |
* To run alphafold: At the beginning of your scripts (eg. run_T1088_af2.sh) you can add the below: | |
``` | |
source /gpfs/u/barn/PMAR/shared/etc/212_alphaFOLD | |
``` | |
* To run rottaafold: At the beginning of your scripts (eg. run_T1088_rf.sh) you can add the below: | |
``` | |
source /gpfs/u/barn/PMAR/shared/etc/rosettaFOLD | |
``` | |
6. type command: | |
``` | |
sbatch run_T1088_af2.sh | |
or | |
sbatch run_T1088_rf.sh | |
``` | |
example: | |
https://github.rpi.edu/RPIBioinformatics/PMAR/blob/main/run_T1088_af2.sh | |
https://github.rpi.edu/RPIBioinformatics/PMAR/blob/main/run_T1088_rf.sh | |
For alphafold example: the output is located in ~/scratch-shared/alphfoldout/PMARhnyn/T1088. I created the PMARhnyn folder and all my alphafold output will be under ~/scratch-shared/alphfoldout/PMARhnyn. You will need to create your own output folder under ~/scratch-shared/alphafoldout, or set -o to other places. | |
For rosettafold example: the output is located in the home directory called T1088rf_out. You may also put the results under ~/scratch-shared. | |
# AlphaFold for non-docker version is installed on CCI: | |
Github site for docker version: | |
https://github.com/deepmind/alphafold | |
when you run run_alphafold.sh with no args you get: | |
``` | |
Please make sure all required parameters are given | |
Usage: /gpfs/u/home/PMAR/PMARrbtj/barn-shared/alphafold/run_alphafold.sh <OPTIONS> | |
Required Parameters: | |
-d <data_dir> Path to directory of supporting data | |
-o <output_dir> Path to a directory that will store the results. | |
-f <fasta_path> Path to a FASTA file containing sequence. If a FASTA file contains multiple sequences, then it will be folded as a multimer | |
-t <max_template_date> Maximum template release date to consider (ISO-8601 format - i.e. YYYY-MM-DD). Important if folding historical test sets | |
Optional Parameters: | |
-g <use_gpu> Enable NVIDIA runtime to run with GPUs (default: true) | |
-n <openmm_threads> OpenMM threads (default: all available cores) | |
-a <gpu_devices> Comma separated list of devices to pass to 'CUDA_VISIBLE_DEVICES' (default: 0) | |
-m <model_preset> Choose preset model configuration - the monomer model, the monomer model with extra ensembling, monomer model with pTM head, or multimer model (default: 'monomer') | |
-c <db_preset> Choose preset MSA database configuration - smaller genetic database config (reduced_dbs) or full genetic database config (full_dbs) (default: 'full_dbs') | |
-p <use_precomputed_msas> Whether to read MSAs that have been written to disk. WARNING: This will not check if the sequence, database or configuration have changed (default: 'false') | |
-l <is_prokaryote> Optional for multimer system, not used by the single chain system. A boolean specifying true where the target complex is from a prokaryote, and false where it is not, or where the origin is unknown. This value determine the pairing method for the MSA (default: 'None') | |
-b <benchmark> Run multiple JAX model evaluations to obtain a timing that excludes the compilation time, which should be more indicative of the time required for inferencing many proteins (default: 'false') | |
``` | |
# RoseTTAFold | |
https://github.com/RosettaCommons/RoseTTAFold | |
# AlphaFold Colab (free service from google) | |
A slightly simplified version of AlphaFold v2.1.0, uses no templates and a selected portion of the BFD database | |
https://colab.research.google.com/github/deepmind/alphafold/blob/main/notebooks/AlphaFold.ipynb |