SpecDB: A Relational Database for archiving biomolecular NMR Spectra Data
SpecDB is a relational database to store and distribute biomoleculae NMR experimental data. SpecDB stores the raw Free Induction Decay (fid) from an NMR experiment in a structured way that matches fid records to biomoleculae sample and experimental meta-data.
Repository Organization
├── LICENSE
├── README.md
├── TUTORIAL.md
├── sample
│ ├── sample.db
│ ├── sample_forms/
│ └── sample_sessions/
├── specdb
│ ├── specdb
│ └── specdb.py
└── sql
├── specdb.sql
└── template.str
Above is the tree layout for SpecDB's repository structure.
sample/
: location of sample data for the tutorialsample/sample.db
: sample SQLite SpecDB databasesample/sample_forms/
: contains example forms for different data types to enter into SpecDBsample/sample_sessions
: contains example Bruker data collection sessions
specdb
: location of specdb python library and command line interface (CLI)specdb/specdb
: the specdb CLIspecdb/specdb.py
: the specdb library. users will typically only interact with the CLI
sql/
: directory where the SpecDB SQLite schema residessql/specdb.sql
: the SpecDB SQLite schema definitionstemplate.str
: the minimal NMR-STAR file SpecDB attempts to write
Getting Started
The goal of SpecDB is to capture and organize time domain data that is generated from biomolecular NMR experiments. To make the time domain data useful for downstream applications, the experiment's metadata such as protein sequence, buffer information, the specific pulse sequence performed, and much more all should be captured. The idea behind SpecDB is to provide users with a set of tools to enter sample and experiment metadata into a SQLite database.
This Getting Started guide will cover two common scenarios for users installing SpecDB. (1) on a machine where they have install permissions, and (2) if the user is on a shared cluster where they do not have install permissions.
To operate SpecDB on a machine where you do have install permissions, here are the steps we recommend.
- clone this repository:
git clone https://github.rpi.edu/RPIBioinformatics/SpecDB.git
- to make SpecDB operate as a command line tool, the
PATH
environment variable needs to amended.export PATH=$PATH:{location of SpecDB}/SpecDB/specdb/
- if a bash profile file is being amended, be sure to
source
the profile after thePATH
environment variable is edited
- if a bash profile file is being amended, be sure to
- the required 3rd part modules for SpecDB are
pandas
,ruamel.yaml
, andpynmrstar
. dopip3 install pandas ruamel.yaml pynmrstar
to download the three libraries - After the
pip install
SpecDB should be operational. Verify that SpecDB is on thePATH
and libraries installed withspecdb --help
. The output should be the following:
usage: specdb [-h] {create,forms,insert,summary,query,backup,restore} ...
Command line tool for interfacing with SpecDB
positional arguments:
{create,forms,insert,summary,query,backup,restore}
command description
create instantiate a new database
forms generate a template form for requested table tables to
generate forms for are: `user`, `project`, `target`
`construct`, `expression`, `batch`, `buffer`, `pst`,
`spectrometer`, and a JSON for a session
insert insert a single json file into SpecDB
summary make a summary report for SpecDB database. if ran with
no table provided, then a summary for every table is
made.
query query records from SpecDB summary table. If no
--output is given then results are simply print to
screen
backup perform incremental backup. specdb configure must be
ran first
restore perform database restoration from a SpecDB backup
optional arguments:
-h, --help show this help message and exit
example command lines:
specdb create --db my.db --backup /backups/my.backup.db
specdb query --db my.db --sql "SELECT * FROM table users LIMIT 10"
specdb insert --file specdb.yaml --env my.db --write
specdb backup --db my.db --backup /backups/my.backup.db
specdb forms --table user project --num 3 1
specdb summary --table user --db my.db
To operate SpecDB in a shared cluster without install permissions, it is first recommended to follow whatever standard operating procedures there are for software installation. If virtual environments are an option, the following steps work for SpecDB:
- make a python virtual environment with
venv
.python3 -m venv {name of your environment}
- to active the environment,
source {name of environment}/bin/activate
- perform the
pip install
described above - to exit the virtual environment do
deactivate {name of environment}
Acknowledgements
The functions to perform the incremental backup are taken from the following repository: https://github.com/nokibsarkar/sqlite3-incremental-backup.git.