README.md

# How to run AlphaFold and RoseTTAFold on CCI

Get an account with PMAR project, set up the login https://github.rpi.edu/RPIBioinformatics/PMAR/blob/main/CCIForms.md

1. ssh PMARxxxx@blp02.ccni.rpi.edu, with google authenticator
2. ssh nplfen01 to access NPL GPUs. (check https://docs.cci.rpi.edu/status/ for all cluster utilization)
3. create or upload your fasta file to nplfen01 (eg. T1088.fasta)
4. If you have modified .bashrc for our first version of alphafold setup, you will need to remove the alphafold setup from .bashrc (example included):
   ```
      * Edit .bashrc, everything below "# User specific aliases and functions" for alphafold can be deleted.
      * Log out and back in, now you will have a clean environment.

      I am also assuming everything was added below that comment, as that is the default last line of the .bashrc file.
    ```
      example: https://github.rpi.edu/RPIBioinformatics/PMAR/blob/main/.bashrc

5. create two files: the fasta file (eg. T1088.fasta) and also the sbatch script (eg. run_T1088_af2.sh for alphfold, run_T1088_rf.sh for rosettafold)
   * To run alphafold: At the beginning of your scripts (eg. run_T1088_af2.sh) you can add the below:
   ```
      source /gpfs/u/barn/PMAR/shared/etc/212_alphaFOLD
   ```
   * To run rottaafold: At the beginning of your scripts (eg. run_T1088_rf.sh) you can add the below:
   ```
      source /gpfs/u/barn/PMAR/shared/etc/rosettaFOLD
   ```
6. type command:
   ```
      sbatch run_T1088_af2.sh
            or
      sbatch run_T1088_rf.sh
   ```

example:

https://github.rpi.edu/RPIBioinformatics/PMAR/blob/main/run_T1088_af2.sh
https://github.rpi.edu/RPIBioinformatics/PMAR/blob/main/run_T1088_rf.sh

For alphafold example: the output is located in ~/scratch-shared/alphfoldout/PMARhnyn/T1088. I created the PMARhnyn folder and all my alphafold output will be under ~/scratch-shared/alphfoldout/PMARhnyn. You will need to create your own output folder under ~/scratch-shared/alphafoldout, or set -o to other places.

For rosettafold example: the output is located in the home directory called T1088rf_out. You may also put the results under ~/scratch-shared.


# AlphaFold for non-docker version is installed on CCI:

Github site for docker version:
https://github.com/deepmind/alphafold

when you run run_alphafold.sh with no args you get:
```
Please make sure all required parameters are given
Usage: /gpfs/u/home/PMAR/PMARrbtj/barn-shared/alphafold/run_alphafold.sh <OPTIONS>
Required Parameters:
-d <data_dir>         Path to directory of supporting data
-o <output_dir>       Path to a directory that will store the results.
-f <fasta_path>       Path to a FASTA file containing sequence. If a FASTA file contains multiple sequences, then it will be folded as a multimer
-t <max_template_date> Maximum template release date to consider (ISO-8601 format - i.e. YYYY-MM-DD). Important if folding historical test sets
Optional Parameters:
-g <use_gpu>          Enable NVIDIA runtime to run with GPUs (default: true)
-n <openmm_threads>   OpenMM threads (default: all available cores)
-a <gpu_devices>      Comma separated list of devices to pass to 'CUDA_VISIBLE_DEVICES' (default: 0)
-m <model_preset>     Choose preset model configuration - the monomer model, the monomer model with extra ensembling, monomer model with pTM head, or multimer model (default: 'monomer')
-c <db_preset>        Choose preset MSA database configuration - smaller genetic database config (reduced_dbs) or full genetic database config (full_dbs) (default: 'full_dbs')
-p <use_precomputed_msas> Whether to read MSAs that have been written to disk. WARNING: This will not check if the sequence, database or configuration have changed (default: 'false')
-l <is_prokaryote>    Optional for multimer system, not used by the single chain system. A boolean specifying true where the target complex is from a prokaryote, and false where it is not, or where the origin is unknown. This value determine the pairing method for the MSA (default: 'None')
-b <benchmark>        Run multiple JAX model evaluations to obtain a timing that excludes the compilation time, which should be more indicative of the time required for inferencing many proteins (default: 'false')
```

# RoseTTAFold

https://github.com/RosettaCommons/RoseTTAFold

# AlphaFold Colab (free service from google)
A slightly simplified version of AlphaFold v2.1.0, uses no templates and a selected portion of the BFD database

https://colab.research.google.com/github/deepmind/alphafold/blob/main/notebooks/AlphaFold.ipynb
	# How to run AlphaFold and RoseTTAFold on CCI

	Get an account with PMAR project, set up the login https://github.rpi.edu/RPIBioinformatics/PMAR/blob/main/CCIForms.md

	1. ssh PMARxxxx@blp02.ccni.rpi.edu, with google authenticator
	2. ssh nplfen01 to access NPL GPUs. (check https://docs.cci.rpi.edu/status/ for all cluster utilization)
	3. create or upload your fasta file to nplfen01 (eg. T1088.fasta)
	4. If you have modified .bashrc for our first version of alphafold setup, you will need to remove the alphafold setup from .bashrc (example included):
	```
	* Edit .bashrc, everything below "# User specific aliases and functions" for alphafold can be deleted.
	* Log out and back in, now you will have a clean environment.

	I am also assuming everything was added below that comment, as that is the default last line of the .bashrc file.
	```
	example: https://github.rpi.edu/RPIBioinformatics/PMAR/blob/main/.bashrc

	5. create two files: the fasta file (eg. T1088.fasta) and also the sbatch script (eg. run_T1088_af2.sh for alphfold, run_T1088_rf.sh for rosettafold)
	* To run alphafold: At the beginning of your scripts (eg. run_T1088_af2.sh) you can add the below:
	```
	source /gpfs/u/barn/PMAR/shared/etc/212_alphaFOLD
	```
	* To run rottaafold: At the beginning of your scripts (eg. run_T1088_rf.sh) you can add the below:
	```
	source /gpfs/u/barn/PMAR/shared/etc/rosettaFOLD
	```
	6. type command:
	```
	sbatch run_T1088_af2.sh
	or
	sbatch run_T1088_rf.sh
	```

	example:

	https://github.rpi.edu/RPIBioinformatics/PMAR/blob/main/run_T1088_af2.sh
	https://github.rpi.edu/RPIBioinformatics/PMAR/blob/main/run_T1088_rf.sh

	For alphafold example: the output is located in ~/scratch-shared/alphfoldout/PMARhnyn/T1088. I created the PMARhnyn folder and all my alphafold output will be under ~/scratch-shared/alphfoldout/PMARhnyn. You will need to create your own output folder under ~/scratch-shared/alphafoldout, or set -o to other places.

	For rosettafold example: the output is located in the home directory called T1088rf_out. You may also put the results under ~/scratch-shared.


	# AlphaFold for non-docker version is installed on CCI:

	Github site for docker version:
	https://github.com/deepmind/alphafold

	when you run run_alphafold.sh with no args you get:
	```
	Please make sure all required parameters are given
	Usage: /gpfs/u/home/PMAR/PMARrbtj/barn-shared/alphafold/run_alphafold.sh <OPTIONS>
	Required Parameters:
	-d <data_dir> Path to directory of supporting data
	-o <output_dir> Path to a directory that will store the results.
	-f <fasta_path> Path to a FASTA file containing sequence. If a FASTA file contains multiple sequences, then it will be folded as a multimer
	-t <max_template_date> Maximum template release date to consider (ISO-8601 format - i.e. YYYY-MM-DD). Important if folding historical test sets
	Optional Parameters:
	-g <use_gpu> Enable NVIDIA runtime to run with GPUs (default: true)
	-n <openmm_threads> OpenMM threads (default: all available cores)
	-a <gpu_devices> Comma separated list of devices to pass to 'CUDA_VISIBLE_DEVICES' (default: 0)
	-m <model_preset> Choose preset model configuration - the monomer model, the monomer model with extra ensembling, monomer model with pTM head, or multimer model (default: 'monomer')
	-c <db_preset> Choose preset MSA database configuration - smaller genetic database config (reduced_dbs) or full genetic database config (full_dbs) (default: 'full_dbs')
	-p <use_precomputed_msas> Whether to read MSAs that have been written to disk. WARNING: This will not check if the sequence, database or configuration have changed (default: 'false')
	-l <is_prokaryote> Optional for multimer system, not used by the single chain system. A boolean specifying true where the target complex is from a prokaryote, and false where it is not, or where the origin is unknown. This value determine the pairing method for the MSA (default: 'None')
	-b <benchmark> Run multiple JAX model evaluations to obtain a timing that excludes the compilation time, which should be more indicative of the time required for inferencing many proteins (default: 'false')
	```

	# RoseTTAFold

	https://github.com/RosettaCommons/RoseTTAFold

	# AlphaFold Colab (free service from google)
	A slightly simplified version of AlphaFold v2.1.0, uses no templates and a selected portion of the BFD database

	https://colab.research.google.com/github/deepmind/alphafold/blob/main/notebooks/AlphaFold.ipynb