SpecDB Tutorial

The purpose of the following tutorial is to provide examples of how to use SpecDB. There are seven levels to the tutorial. (1) SpecDB subcommands and help menus. (2) how to instantiate a database, (3) how to make forms, (4) how to insert forms, (5) viewing what is in the database, (6) updating records in the database, and (7) querying information from the database.

1. SpecDB help menus and subcommands

The first entry point to look for guidance on SpecDB functions is to use the help menus. If specdb --help results in the help menu for SpecDB, then it is installed correctly. SpecDB has seven sub commands, each listed below and the command line arguments each take:

specdb create --db --backup
specdb forms --table --num
specdb insert --form --db --write
specdb summary --table --db
specdb query --sql --db --out --star --indices
specdb backup --db --objects --shafile
specdb restore --backup --shafile --objects

The subcommands listed above in the logical order the commands are used in. Each subcommand has a separate help menu from specdb --help that can be accessed, (e.g specdb forms --help). Users first need to create a SpecDB SQLite database file with create. Next, users need to populate the database with information. The forms command makes the forms for the data fields needed for the SpecDB schema. With a filled form, users use insert to insert the form into their database. To verify/check what they inserted, users can use summary to investigate the contents of any SpecDB table. Users can pull data out of the database with query. With query users provide a SQL SELECT statement on the SpecDB summary view to pull data out of the database. query also allows for users to extract records from SpecDB by row-ids in the summary table. Commands backup and restore are for the incremental backup operations.

2. Instantiating a new SpecDB database

A SpecDB database can be created with specdb create. The location of the database .db file needs to be known by the users using the database, or an environment variable can be set. In this tutorial we are going to create a database for the sample data stored in this repository ./sample/.

$ specdb create --db sample.db --backup backup.db
db configured and initalized
Done

3. Making forms

Now that a database has been made, we can start adding data to the database. To add data we need to populate a text file form to be read into the database. SpecDB uses YAML to build forms. Examples can be seen in /sample/sample_forms/

Running specdb forms --table user results in the user form being printed to screen. The output should be exactly as seen below.

user:                    # provide information about a user, REQUIRED: `user_id`
  0:
    user_id:                                      #  no spaces, must be unique, len <= 8, Ex: KJF
    given_name:                                   #  len <= 64, Ex: Keith
    family_name:                                  #  len <= 64, Ex: Fraga
    middle_initials:                              #  len <= 16, Ex: J
    department_and_institution:                   #  len <= 128, Ex: MCB UCD
    country:                                      #  len <= 32, Ex: USA
    state_province:                               #  len <= 32, Ex: NY
    city:                                         #  len <= 32, Ex: Troy
    postal_code:                                  #  len <= 32, Ex: 12180
    role:                                         #  len <= 64, Ex: postdoc
    organization_type:                            #  len <= 64, Ex: academic
    email_address:                                #  no spaces, must have `@`, len <= 64, Ex: user@mail.com

To fill the form, the output of the specdb forms command can be redirected to any file. (Note: the file extension is not strict, tested with txt, yaml, yml. )

$ specdb forms --table user > user.yml

Inspection of the resulting form shows that name of the table this form was created for was created for: user. On the same line lists the required items need to be filled into this form for insertion into SpecDB. All items not indicated as required are optional. (Note: optional SpecDB terms may be required for BMRB deposition however).

In the sample/sample_forms/ directory is a users yaml form pre filled out with some user information as an example. Each line in the form file has a comment describing the restrictions on allowable values for that data item, as a well as an example of what can be used for that respective field. Viewing the sample/sample_forms/users.yaml we see there are actually multiple users provided. The specdb forms command has the --num option for specifying the number of forms to make for a requested table. In the case of sample/sample_forms/users.yaml we requested 3. To make the template to do something like users.yaml is the following command:

$ specdb forms --table user --num 3
user:                    # provide information about a user, REQUIRED: `user_id`
  0:
    user_id:                                      #  no spaces, must be unique, len <= 8, Ex: KJF
    given_name:                                   #  len <= 64, Ex: Keith
    family_name:                                  #  len <= 64, Ex: Fraga
    middle_initials:                              #  len <= 16, Ex: J
    department_and_institution:                   #  len <= 128, Ex: MCB UCD
    country:                                      #  len <= 32, Ex: USA
    state_province:                               #  len <= 32, Ex: NY
    city:                                         #  len <= 32, Ex: Troy
    postal_code:                                  #  len <= 32, Ex: 12180
    role:                                         #  len <= 64, Ex: postdoc
    organization_type:                            #  len <= 64, Ex: academic
    email_address:                                #  no spaces, must have `@`, len <= 64, Ex: user@mail.com
  1:
    user_id:                                      #  no spaces, must be unique, len <= 8, Ex: KJF
    given_name:                                   #  len <= 64, Ex: Keith
    family_name:                                  #  len <= 64, Ex: Fraga
    middle_initials:                              #  len <= 16, Ex: J
    department_and_institution:                   #  len <= 128, Ex: MCB UCD
    country:                                      #  len <= 32, Ex: USA
    state_province:                               #  len <= 32, Ex: NY
    city:                                         #  len <= 32, Ex: Troy
    postal_code:                                  #  len <= 32, Ex: 12180
    role:                                         #  len <= 64, Ex: postdoc
    organization_type:                            #  len <= 64, Ex: academic
    email_address:                                #  no spaces, must have `@`, len <= 64, Ex: user@mail.com
  2:
    user_id:                                      #  no spaces, must be unique, len <= 8, Ex: KJF
    given_name:                                   #  len <= 64, Ex: Keith
    family_name:                                  #  len <= 64, Ex: Fraga
    middle_initials:                              #  len <= 16, Ex: J
    department_and_institution:                   #  len <= 128, Ex: MCB UCD
    country:                                      #  len <= 32, Ex: USA
    state_province:                               #  len <= 32, Ex: NY
    city:                                         #  len <= 32, Ex: Troy
    postal_code:                                  #  len <= 32, Ex: 12180
    role:                                         #  len <= 64, Ex: postdoc
    organization_type:                            #  len <= 64, Ex: academic
    email_address:                                #  no spaces, must have `@`, len <= 64, Ex: user@mail.com

This output can be redirected to any file and filled in the with appropriate information.

Forms can be made to include information for different tables in the same form. A very common use for this is inserting a buffer into SpecDB. There are two tables in the SpecDB schema that describe buffers, buffer and buffer_components. buffer provides the name of the buffer as the buffer_id, and the buffer's pH. The buffer_components table allows for variable number of components to be added to the buffer. For example, if a buffer with 3 components is needed to be added to SpecDB, the following command would generate the correct form:

buffer:                  # store all buffers used, their id and their pH, REQUIRED: `buffer_id`, `buffer_pH`, NOTE: the buffer components go in the `buffer_components` table
  0:
    buffer_id:                                    #  text identifier for the buffer, must be unique, no spaces, len <= 32, Ex: hn4071
    buffer_ph:                                    #  pH of buffer, must be a number, Ex: 8.1
    buffer_comment:                               #  free text field for any notes about the buffer, len <= 128, Ex: made by KJF 12/04
buffer_components:       # describe the component(s) of a buffer, REQUIRED: `buffer_id`, `buffer_component`, `buffer_component_value`, `buffer_component_unit`
  0:
    buffer_id:                                    #  `buffer_id` this component goes to, must already be in buffer table, Ex: hn4071
    buffer_component:                             #  name of the component, len <= 64, Ex: NaCl
    buffer_component_value:                       #  the numeric concentration value of the component, Ex: 100.0
    buffer_component_unit:                        #  unit of concentration, one of (`mM`, `% (v/v)`, `mg/mL`), Ex: mM
  1:
    buffer_id:                                    #  `buffer_id` this component goes to, must already be in buffer table, Ex: hn4071
    buffer_component:                             #  name of the component, len <= 64, Ex: NaCl
    buffer_component_value:                       #  the numeric concentration value of the component, Ex: 100.0
    buffer_component_unit:                        #  unit of concentration, one of (`mM`, `% (v/v)`, `mg/mL`), Ex: mM
  2:
    buffer_id:                                    #  `buffer_id` this component goes to, must already be in buffer table, Ex: hn4071
    buffer_component:                             #  name of the component, len <= 64, Ex: NaCl
    buffer_component_value:                       #  the numeric concentration value of the component, Ex: 100.0
    buffer_component_unit:                        #  unit of concentration, one of (`mM`, `% (v/v)`, `mg/mL`), Ex: mM

It is important to note that if the --num option is provided, that the number of iterations to take match the number of tables requested. In the above case, the buffer form was created once because of the 1 after the --num and three buffer_components were made because of the 3 after the 1 in the --num options. The number of options in --table and --num are in direct, sequential correspondence with each other. If no --num options are provided it is assumed that all tables are produced just once.

Inspecting sample/sample_forms/complete_sample.yaml will find all the information required to describe a biomolecular NMR sample. It is recommended that users use specdb forms to create the forms when they need them because users can define multiple entities at a time, and one general form will not suffice. However, it is instructive to see all the metadata items that are tracked in SpecDB by looking at complete_sample.yaml.

To follow along with the sample forms provided in the repository, perform the following commands:

$ specdb forms --table user --num 3 > user.yaml
$ specdb forms --table project > project.yaml
$ specdb forms --table buffer buffer_components --num 1 3 > buffer.yaml
$ specdb forms --table target construct expression purification_batch pst batch_components > sample.yaml
$ specdb forms --table spectrometer > spectrometer.yaml
$ specdb forms --table session > session.yaml

4. Inserting forms

In this section, users can practice with the pre-filled forms in sample/sample_forms/. To first insert users in a group, the following insert command will insert the user information provided in user.yaml. We will be working off the sample forms in the RPISpecDB repository. To follow-along, just provide the path to where the repository is installed locally.

$ specdb insert --db sample.db --form RPISpecDB/sample/sample_forms/users.yaml
information provided is new
{
	"user_id": "KJF",
	"given_name": "Keith",
	"family_name": "Fraga",
	"middle_initials": "J",
	"department_and_institution": "UC MCB",
	"country": null,
	"state_province": null,
	"city": null,
	"postal_code": null,
	"role": null,
	"organization_type": null,
	"email_address": "kjfraga@ucdavis.edu"
}
Must set --write to insert provided information to database
Aborting

sample.db is the database created in section (2) of the tutorial. Running the above commnd results in a Aborting error. This means that SpecDB attempted the insert, but found that the requested information was new and could not complete the insertion. This check is in place to ensure users truly want to insert the requested information. Sometimes the provided information may be new, but new and unique because of a typo and the actual information being inserted is already in the database. To overwrite this check, the --write flag must be used. Essentially the --write flag is meant to check with the user if they really want to insert the provided information to the database.

$ specdb insert --db sample.db --form RPISpecDB/sample/sample_forms/users.yaml --write
Inserted data from json file at RPISpecDB/sample/sample_forms/users.yaml

The last line indicates the data was inserted. Running the specdb insert command on the buffers.yaml with the --write flag results in the following.

$ specdb insert --db sample.db --form RPISpecDB/sample/sample_forms/buffers.yaml --write
Inserted data from json file at RPISpecDB/sample/sample_forms/buffers.yaml

Now that the buffer is inserted, we can attempt to insert our information about a sample. To learn more about the relationships and definitions of each of the tables in the SpecDB manuscript refer to the manuscript. Each table, from target to pst needs to be inserted to fully describe a sample. However, typically for each table there is a subset of required data items, so not every data item is required for insertion.

$ specdb insert --db sample.db --form RPISpecDB/sample/sample_forms/sample.yaml --write
SQLite error on insert on table target
Err Message:
FOREIGN KEY constraint failed
check the template form for instructions and examples
ensure all ids this table relates to (i.e constructs relate to targets) are inserter already
Aborting

This insertion results in a different error from before. Here, the error was not because the information was new because the --write was on. The error is coming from SQLite, that a foreign key constraint failed. A foreign key is a relationship between two tables. These foreign keys need to be present in the two tables for SpecDB to allow an insert. The error message is not clear on what column the constraint failure came on, but it says the error came from the target table and to inspect the form for what is required for a target. Using forms we can see what is required for target.

$ specdb forms --table target
target:                  # molecular target information, REQUIRED: `target_id`
  0:
    target_id:                                    #  text identifier for target, must be unique, no spaces, len <= 32, Ex: Db0515
    target_comment:                               #  free field comment, anything to note about target, len <= 128, Ex: from UW-Madison
    target_sequence:                              #  target's molecular seq., any type of code/alphabet, no spaces, len <= 1024, Ex: MGSHHHHILVAM
    organism_source:                              #  organism name for target source, can indicate if target is synthetic, len <= 128, Ex: synthetic
    gene_name:                                    #  gene name, len <= 64, Ex: SpikeCoV2
    project_id:                                   #  `project_id` target is a member of, must be in project table, Ex: SpikeFraga
    target_preparer:                              #  `user_id` that assigned the target, must be in user table, Ex: KJF

There are relationships between the user table and the project table. Inspecting what we tried to add for the target in sample.yaml we see:

target:                  # molecular target information, REQUIRED: `target_id`
  0:
    target_id: Db0515                             #  text identifier for target, must be unique, no spaces, len <= 32, Ex: Db0515
    target_comment:                               #  free field comment, anything to note about target, len <= 128, Ex: from UW-Madison
    target_sequence: MGPLIEVLA                    #  target's molecular seq., any type of code/alphabet, no spaces, len <= 1024, Ex: MGSHHHHILVAM
    organism_source: synthetic                    #  organism name for target source, can indicate if target is synthetic, len <= 128, Ex: synthetic
    gene_name:                                    #  gene name, len <= 64, Ex: SpikeCoV2
    project_id: DBh                               #  `project_id` target is a member of, must be in project table, Ex: SpikeFraga
    target_preparer: GTM                          #  `user_id` that assigned the target, must be in user table, Ex: KJF

We inserted the user information for GTM but not the project id DBh this target is associated with. We can solve this by inserting the project information using the sample/sample_forms/project.yaml form.

$ specdb insert --db sample.db --form RPISpecDB/sample/sample_forms/project.yaml --write
Inserted data from json file at RPISpecDB/sample/sample_forms/project.yaml

Then re-doing the sample insert we get:

specdb insert --db sample.db --form RPISpecDB/sample/sample_forms/sample.yaml --write
Inserted data from json file at RPISpecDB/sample/sample_forms/sample.yaml

To insert FIDs into SpecDB, we first need to insert the spectrometer information.

$ specdb insert --db sample.db --form RPISpecDB/sample/sample_forms/spectrometer.yaml --write
Inserted data from json file at RPISpecDB/sample/sample_forms/spectrometer.yaml

Inserting FIDs into SpecDB are typically organized into sessions. When FIDs are recorded from a Bruker instrument, they are recorded into sub-directories of a data collection session directory. We mimicked this situation in sample/sample_sessions/. In example1/, there are two subdirectories, each with a time domain dataset. There is also a specdb.session.yaml. This file can be creatd with specdb forms --table session. The session yaml describe the spectrometer this data collection was done on, the user who conducted the session, and the sample tube the data was collected of. Contents of this session is displayed below:

session:                 # describe a data collection session, REQUIRED: `session_preparer`, `spectrometer_id`, `pst_id`
  0:
    session_preparer: KJF                         #  `user_id` of the person who prepared and collected the session, must already be in user table, Ex: KJF
    spectrometer_id: Hu800                        #  `spectrometer_id` session was collected at, must already be in spectrometer table, Ex: Hu800
    pst_id: Db0515A.000                           #  `pst_id` the session is for

All three items in this session are required.

Performing the insert on this session yaml file results in:

$ specdb insert --db sample.db --form RPISpecDB/sample/sample_sessions/example1/specdb.session.yaml --write
a ./1/uxnmr.par
a ./1/prosol_History
a ./1/pulseprogram
a ./1/format.temp
a ./1/audita.txt
a ./1/pdata/1/proc
a ./1/pdata/1/title
a ./1/pdata/1/procs
a ./1/fid
a ./1/scon2
a ./1/acqu
a ./1/acqus
a ./2/uxnmr.par
a ./2/prosol_History
a ./2/pulseprogram
a ./2/acqu2s
a ./2/format.temp
a ./2/audita.txt
a ./2/acqu2
a ./2/pdata/1/proc
a ./2/pdata/1/proc2
a ./2/pdata/1/title
a ./2/pdata/1/proc2s
a ./2/pdata/1/procs
a ./2/cpdprg3
a ./2/ser
a ./2/scon2
a ./2/acqu
a ./2/acqus
Inserted data from form file at RPISpecDB/sample/sample_sessions/example1/specdb.session.yaml

SpecDB is finding what sub directories that have time domain data, finding the parameter files, and inserting the time domain data into the database. The lines that get printed with a ./* reflect the files that SpecDB is finding and placing into a tar archive to ultimately be inserted into SpecDB. The session yaml file changes, reflecting all the information the specdb insert found:

session:                 # describe a data collection session, REQUIRED: pst_id (if not none, just use 'NONE')
  0:
    session_preparer: TAR                         #  `user_id` of the person who prepared and collected the session, must already be in user table, Ex: KJF
    spectrometer_id: Hu800                        #  `spectrometer_id` session was collected at, must already be in spectrometer table, Ex: Hu800
    pst_id: Db0515A.000                           #  `pst_id` the session is for
time_domain_dataset:     # record information about a time domain dataset, REQUIRED: `subdir_name`
  0:
    subdir_name: '1'                              #  path to where fid/ser file is, can be a relative path, len <= 128, Ex: 1/
    pst_id: Db0515A.000                           #  `pst_id` of sample, must already be in pst table, Ex: Db0515.001
    pulse_sequence_nickname:                      #  nickname for pulse sequence, controlled vocabulary
    probe_id:                                     #  `probe_id` for the probe, if blank probe information pulled from fid directory
  1:
    subdir_name: '2'                              #  path to where fid/ser file is, can be a relative path, len <= 128, Ex: 1/
    pst_id: Db0515A.000                           #  `pst_id` of sample, must already be in pst table, Ex: Db0515.001
    pulse_sequence_nickname:                      #  nickname for pulse sequence, controlled vocabulary
    probe_id:                                     #  `probe_id` for the probe, if blank probe information pulled from fid directory

Other points to make note of, SpecDB does search for processing scripts and processing parameter files in a pdata/ directories. However, no processed data is archived in a SpecDB instance. The sample session example1/ did not have a nmrpipe processing scripts. The session example3 does. Inserting example3 looks like:

$ specdb insert --db sample.db --form RPISpecDB/sample/sample_sessions/example3/specdb.yaml --write
a ./10_Chsqc-800/uxnmr.par
a ./10_Chsqc-800/pulseprogram
a ./10_Chsqc-800/.DS_Store
a ./10_Chsqc-800/acqu2s
a ./10_Chsqc-800/proc.com
a ./10_Chsqc-800/format.temp
a ./10_Chsqc-800/audita.txt
a ./10_Chsqc-800/acqu2
a ./10_Chsqc-800/pdata/1/proc
a ./10_Chsqc-800/pdata/1/proc2
a ./10_Chsqc-800/pdata/1/title
a ./10_Chsqc-800/pdata/1/proc2s
a ./10_Chsqc-800/pdata/1/procs
a ./10_Chsqc-800/cpdprg2
a ./10_Chsqc-800/overflow_log_dru1.txt
a ./10_Chsqc-800/uxnmr.info
a ./10_Chsqc-800/ser
a ./10_Chsqc-800/fid.com
a ./10_Chsqc-800/scon2
a ./10_Chsqc-800/acqu
a ./10_Chsqc-800/acqus
a ./10_Chsqc-800/specpar
Inserted data from form file at /Users/kfraga/RESEARCH/RPISpecDB/sample/sample_sessions/example3/specdb.yaml

The *.com scripts are nmrpipe scripts that were used to process that specific FID, which are saved separated in SpecDB.

SpecDB also allows users to insert default processing scripts for pulse sequences used in their research group. To insert a default processing script, you can use specdb forms to inspect what is required.

$ specdb forms --table default_processing_scripts
default_processing_scripts: # record a default nmrpipe processing script for every pulse sequence
  0:
    pulse_sequence_name:                          #  name of pulse sequence, from Bruker it is the value of the PULPROG tag
    nickname:                                     #  user provided nickname, references the pulse_sequence_nicknames table
    default_processing:                           #  path to text file of a nmrpipe processing script that read a FID for this pulse sequence type

You can use specdb summary view the possible pulse sequence nicknames to use:

$ specdb summary ---specdb summary --db sample.db --table pulse_sequence_names

  id             nickname
  55      13C-(1H) hetNOE
  54      15N-(1H) hetNOE
  80  15N-1H solid-sta...
   2               1D 13C
   3               1D 15N
  66          1D 15N CPMG
  57            1D 15N T1
  63         1D 15N T1rho
  60            1D 15N T2
   1                1D 1H
   4               1D 31P
  90  1H-detected (H)N...
  61           2D  15N T2
  18    2D (HB)CB(CGCD)HD
  71      2D 13C-13C DARR
  88  2D 13C-detected ...
  89  2D 13C-detected NCA
  84      2D 13C/13C PDSD
  85     2D 13C/13C TOBSY
  67          2D 15N CPMG
  58            2D 15N T1
  64         2D 15N T1rho
  82     2D 15N-13C TEDOR
   5       2D 1H-13C HMQC
   9       2D 1H-13C HSQC
  11  2D 1H-13C HSQC a...
  12  2D 1H-13C HSQC a...
  10  2D 1H-13C HSQC/HMQC
   6       2D 1H-15N HSQC
   7  2D 1H-15N HSQC N...
   8      2D 1H-15N TROSY
  16        2D 1H-1H COSY
  17       2D 1H-1H NOESY
  14       2D 1H-1H ROESY
  13       2D 1H-1H TOCSY
  70              2D DARR
  15          2D DQF-COSY
  87          2D HBCBCGCD
  86      2D HBCBCGCDCEHE
  36      3D (C)CCH-TOCSY
  38  3D 13C-separated...
  68          3D 15N CPMG
  59            3D 15N T1
  65         3D 15N T1rho
  62            3D 15N T2
  56  3D 15N,13C sim N...
  37  3D 15N-separated...
  41      3D 1H-13C NOESY
  42  3D 1H-13C NOESY ...
  43  3D 1H-13C NOESY ...
  39      3D 1H-15N NOESY
  40      3D 1H-15N TOCSY
  23           3D C(CO)NH
  21          3D CBCACONH
  22            3D CBCANH
  33         3D CCH-TOCSY
  34          3D H(CCO)NH
  30        3D HBHA(CO)NH
  26            3D HBHANH
  47             3D HCACO
  46         3D HCCH-COSY
  35        3D HCCH-TOCSY
  28          3D HN(CA)CO
  31          3D HN(CO)CA
  32        3D HN(CO)CACB
  48        3D HN(COCA)CB
  25              3D HNCA
  27            3D HNCACB
  29            3D HNCACO
  24              3D HNCO
  44              3D HNHA
  45              3D HNHB
  77             3D NCACX
  76             3D NCOCX
  91             3D RNCSA
  92             3D ROCSA
  50  4D 13C-separated...
  49          4D CC NOESY
  51     5D APSY CBCACONH
  52     5D APSY HACACONH
  53        5D HN(CA)CONH
  72              CANcoCX
  83             CN TEDOR
  75                CONCA
  69                 DARR
  81             FS-REDOR
  19             JR NOESY
  20        L-PROSY NOESY
  73                NCACX
  74                NCOCX
  78                 PAIN
  79                  PAR

Section 5 below discusses more how to use specdb summary. To submit a default processing script then you can make default_processing_script yaml file and use specdb insert to insert it. The default_processing tag is expecting a filesystem path to where a text file for processing script wants to submit. There is an example default processing script yaml in the repo at sample/sample_forms/

$ $ specdb insert --db sample.db --form RPISpecDB/sample/sample_forms/default.processing.yaml --write 
Inserted data from form file at RPISpecDB/sample/sample_forms/default.processing.yaml

5. View what is in the database

Each time we inserted a form into the database, we did not check that the provided information actually is stored in the database. If the Inserted into message comes up after the insert then no errors in the insert arrived. We can view a summary of the current entries in a table in the database by using the specdb summary command. With summary we need to provide the name of the table we want to summarize and in what database file we want to look into.

$ specdb summary --db sample.db --table user

  id user_id given_name family_name middle_initial department_and country state_province  city postal_code  \
   3     GTM    Gaetano  Montelione           None           None    None           None  None        None   
   2     KJF      Keith       Fraga              J            RPI    None           None  None        None   
   1    NONE       None        None           None           None    None           None  None        None   
   4     TAR    Theresa     Ramelot              A           None    None           None  None        None   

     role organization_t        email_address  
     None           None      monteg3@rpi.edu  
  postdoc           None  kjfraga@ucdavis.edu  
     None           None                 None  
     None           None      ramelt2@rpi.edu

A simple table is printed to screen of the current items in the database. Try it on any other table.

6. Update a record in SpecDB

It is often the case that information already in the database needs to be amended or added to. Suppose we want to update a user's information. The strategy is to pull out the information already in the database into a yaml form, edit the form, write it back to the database with the write flag. SpecDB detects when in a form only a table id is provided, like user_id, and the rest of the items are blank. In that case, SpecDB understands that what is being requested is the rest of information for that user. We are going to use this functionality to update a record.

First, make an empty user form.

$ specdb forms --table user > utest.yml

Add only the id of the user we want to edit. In this case we are only adding KJF to the user_id tag.

$ cat utest.yml 
user:                    # provide information about a user, REQUIRED: `user_id`
  0:
    user_id: KJF                                     #  no spaces, must be unique, len <= 8, Ex: KJF
    given_name:                                   #  len <= 64, Ex: Keith
    family_name:                                  #  len <= 64, Ex: Fraga
    middle_initials:                              #  len <= 16, Ex: J
    department_and_institution:                   #  len <= 128, Ex: MCB UCD
    country:                                      #  len <= 32, Ex: USA
    state_province:                               #  len <= 32, Ex: NY
    city:                                         #  len <= 32, Ex: Troy
    postal_code:                                  #  len <= 32, Ex: 12180
    role:                                         #  len <= 64, Ex: postdoc
    organization_type:                            #  len <= 64, Ex: academic
    email_address:                                #  no spaces, must have `@`, len <= 64, Ex: user@mail.com

Using specdb insert to pull the information for user KJF out from the database.

$ specdb insert --db sample.db --form utest.yml 
Inserted data from form file at utest.yml
$ cat utest.yml 
user:                    # provide information about a user, REQUIRED: `user_id`
  0:
    user_id: KJF                                  #  no spaces, must be unique, len <= 8, Ex: KJF
    given_name: Keith                             #  len <= 64, Ex: Keith
    family_name: Fraga                            #  len <= 64, Ex: Fraga
    middle_initials: J                            #  len <= 16, Ex: J
    department_and_institution: RPI               #  len <= 128, Ex: MCB UCD
    country:                                      #  len <= 32, Ex: USA
    state_province:                               #  len <= 32, Ex: NY
    city:                                         #  len <= 32, Ex: Troy
    postal_code:                                  #  len <= 32, Ex: 12180
    role: postdoc                                 #  len <= 64, Ex: postdoc
    organization_type:                            #  len <= 64, Ex: academic
    email_address: kjfraga@ucdavis.edu            #  no spaces, must have `@`, len <= 64, Ex: user@mail.com

As you can see, the other tags defined for user KJF have been pulled in. We can then edit utest.yml to change the first name from Keith to Kevin. (Any example would suffice). Re-insert utest.yml. First without --write, then with.

$ specdb insert --db sample.db --form utest.yml 
requested data is different than the database
input information:
{
	"user_id": "KJF",
	"given_name": "Kevin",
	"family_name": "Fraga",
	"middle_initials": "J",
	"department_and_institution": "RPI",
	"country": null,
	"state_province": null,
	"city": null,
	"postal_code": null,
	"role": "postdoc",
	"organization_type": null,
	"email_address": "kjfraga@ucdavis.edu"
}
database information:
{
	"user_id": "KJF",
	"given_name": "Keith",
	"family_name": "Fraga",
	"middle_initials": "J",
	"department_and_institution": "RPI",
	"country": null,
	"state_province": null,
	"city": null,
	"postal_code": null,
	"role": "postdoc",
	"organization_type": null,
	"email_address": "kjfraga@ucdavis.edu"
}
Aborting
$ specdb insert --db sample.db --form utest.yml --write
Inserted data from form file at utest.yml

Using summary, we can see the change.

$ specdb summary --db sample.db --table user

  id user_id given_name family_name middle_initial department_and country state_province  city postal_code  \
   3     GTM    Gaetano  Montelione           None           None    None           None  None        None   
   5     KJF      Kevin       Fraga              J            RPI    None           None  None        None   
   1    NONE       None        None           None           None    None           None  None        None   
   4     TAR    Theresa     Ramelot              A           None    None           None  None        None   

     role organization_t        email_address  
     None           None      monteg3@rpi.edu  
  postdoc           None  kjfraga@ucdavis.edu  
     None           None                 None  
     None           None      ramelt2@rpi.edu

7. Query information from SpecDB

To query information from SpecDB, some FID data must be inserted first. The specdb query command only accepts queries against the summary view in the database (not to be confused with specdb summary). To view the contents of the summary view, we can use specdb summary.

$ specdb summary --db sample.db --table summary

  id       experiment_dat pulse_sequence pulse_sequence          md5checksum  session_id session_prepar  \
  11  2010-09-26 21:33...           None           zgpr  f1d80ff78a50d207...           1            TAR   
  12  2010-09-26 22:16...           None   hsqcetf3gpsi  e3c4e84d0ae67e8c...           1            TAR   
  13  2021-03-01 19:01...           None       hsqcetgp  335f5a0ef63cb690...           2            KJF   

  temperature buffer_id       pst_id pst_preparer batch_id batch_preparer sample_sequenc expression_id  \
     293.0000   h7.4n50  Db0515A.000          TAR  Db0515A           None       GPLIEVLA       Db0515A   
     293.0000   h7.4n50  Db0515A.000          TAR  Db0515A           None       GPLIEVLA       Db0515A   
     297.9658   h7.4n50  Db0515A.000          TAR  Db0515A           None       GPLIEVLA       Db0515A   

 expression_pre construct_id construct_prep target_id target_prepare target_sequenc project  \
            TAR      Db0515A            KJF    Db0515            GTM      MGPLIEVLA     DBh   
            TAR      Db0515A            KJF    Db0515            GTM      MGPLIEVLA     DBh   
            TAR      Db0515A            KJF    Db0515            GTM      MGPLIEVLA     DBh   

       construct_sequ spectrometer_i    field probe_id              probehd tube_type           zipped_dir  
  MGHHHHHHHSMGPLIEVLA          Hu800  800 MHz     None  1.7_mm_CPTCI_1H-...  NMR tube  b'\x1f\x8b\x08\x...  
  MGHHHHHHHSMGPLIEVLA          Hu800  800 MHz     None  1.7_mm_CPTCI_1H-...  NMR tube  b'\x1f\x8b\x08\x...  
  MGHHHHHHHSMGPLIEVLA          Hu800  800 MHz     None  5_mm_CPTCI_1H-13...  NMR tube  b'\x1f\x8b\x08\x...

In this example we are going to query for FIDs produced from an HSQC. This can be accomplished with the following query:

$ specdb query --sql "SELECT * FROM summary WHERE pulse_sequence LIKE '%hsqc%'" --db sample.db 

  id       experiment_dat pulse_sequence pulse_sequence          md5checksum  session_id session_prepar  \
  12  2010-09-26 22:16...           None   hsqcetf3gpsi  e3c4e84d0ae67e8c...           1            TAR   
  13  2021-03-01 19:01...           None       hsqcetgp  335f5a0ef63cb690...           2            KJF   

  temperature buffer_id       pst_id pst_preparer batch_id batch_preparer sample_sequenc expression_id  \
     293.0000   h7.4n50  Db0515A.000          TAR  Db0515A           None       GPLIEVLA       Db0515A   
     297.9658   h7.4n50  Db0515A.000          TAR  Db0515A           None       GPLIEVLA       Db0515A   

 expression_pre construct_id construct_prep target_id target_prepare target_sequenc project  \
            TAR      Db0515A            KJF    Db0515            GTM      MGPLIEVLA     DBh   
            TAR      Db0515A            KJF    Db0515            GTM      MGPLIEVLA     DBh   

       construct_sequ spectrometer_i    field probe_id              probehd tube_type           zipped_dir  
  MGHHHHHHHSMGPLIEVLA          Hu800  800 MHz     None  1.7_mm_CPTCI_1H-...  NMR tube  b'\x1f\x8b\x08\x...  
  MGHHHHHHHSMGPLIEVLA          Hu800  800 MHz     None  5_mm_CPTCI_1H-13...  NMR tube  b'\x1f\x8b\x08\x...

Without specifying an output directory, specdb query produces a table of records that were found in the query. This helps users dissect and refine their query to the records they ultimately want. Specifying an output directory results in those the FIDs being copied into the directory the user specifies.

$ specdb query --sql "SELECT * FROM summary WHERE pulse_sequence LIKE '%hsqc%'" --db sample.db --out query_results
$ tree query_results/
query_results/
├── TAR_Db0515A.000_20100926
│   └── 2
│       ├── acqu
│       ├── acqu2
│       ├── acqu2s
│       ├── acqus
│       ├── audita.txt
│       ├── cpdprg3
│       ├── format.temp
│       ├── pdata
│       │   └── 1
│       │       ├── proc
│       │       ├── proc2
│       │       ├── proc2s
│       │       ├── procs
│       │       └── title
│       ├── prosol_History
│       ├── pulseprogram
│       ├── scon2
│       ├── ser
│       └── uxnmr.par
└── TAR_Db0515A.000_20210301
    └── 10_Chsqc-800
        ├── acqu
        ├── acqu2
        ├── acqu2s
        ├── acqus
        ├── audita.txt
        ├── cpdprg2
        ├── fid.com
        ├── format.temp
        ├── overflow_log_dru1.txt
        ├── pdata
        │   └── 1
        │       ├── proc
        │       ├── proc2
        │       ├── proc2s
        │       ├── procs
        │       └── title
        ├── proc.com
        ├── pulseprogram
        ├── scon2
        ├── ser
        ├── specpar
        ├── uxnmr.info
        └── uxnmr.par

The above command made a directory query_results/ and places the query results in this folder. The naming of the subdirectories underneath results goes {user_id}_{pst_id}_{date}. Then underneath this results subdirectory is the FID that was found as a result of the query. Note that the whole directory the ser file was in was also saved, keeping the auxillary files that were made in during the ser file collection.

To produce a NMR-STAR file for this result, just use the --star command to produce a STAR file in the directory where the ser file is.

$ specdb query --sql "SELECT * FROM summary WHERE pulse_sequence LIKE '%hsqc%'" --db sample.db --out query_results --star
2022-06-21 14:49:41,207,207 WARNING  [parser.py:161] Loop with no data on line: 33
2022-06-21 14:49:41,207,207 WARNING  [parser.py:161] Loop with no data on line: 72
2022-06-21 14:49:41,208,208 WARNING  [parser.py:161] Loop with no data on line: 84
2022-06-21 14:49:41,208,208 WARNING  [parser.py:161] Loop with no data on line: 93
2022-06-21 14:49:41,208,208 WARNING  [parser.py:161] Loop with no data on line: 130
2022-06-21 14:49:41,208,208 WARNING  [parser.py:161] Loop with no data on line: 161
2022-06-21 14:49:41,208,208 WARNING  [parser.py:161] Loop with no data on line: 200
2022-06-21 14:49:41,209,209 WARNING  [parser.py:161] Loop with no data on line: 237
2022-06-21 14:49:41,209,209 WARNING  [parser.py:161] Loop with no data on line: 280
2022-06-21 14:49:41,209,209 WARNING  [parser.py:161] Loop with no data on line: 304
2022-06-21 14:49:41,209,209 WARNING  [parser.py:161] Loop with no data on line: 390
2022-06-21 14:49:41,210,210 WARNING  [parser.py:161] Loop with no data on line: 403
2022-06-21 14:49:41,604,604 WARNING  [parser.py:161] Loop with no data on line: 33
2022-06-21 14:49:41,605,605 WARNING  [parser.py:161] Loop with no data on line: 72
2022-06-21 14:49:41,605,605 WARNING  [parser.py:161] Loop with no data on line: 84
2022-06-21 14:49:41,605,605 WARNING  [parser.py:161] Loop with no data on line: 93
2022-06-21 14:49:41,605,605 WARNING  [parser.py:161] Loop with no data on line: 130
2022-06-21 14:49:41,605,605 WARNING  [parser.py:161] Loop with no data on line: 161
2022-06-21 14:49:41,606,606 WARNING  [parser.py:161] Loop with no data on line: 200
2022-06-21 14:49:41,606,606 WARNING  [parser.py:161] Loop with no data on line: 237
2022-06-21 14:49:41,606,606 WARNING  [parser.py:161] Loop with no data on line: 280
2022-06-21 14:49:41,606,606 WARNING  [parser.py:161] Loop with no data on line: 304
2022-06-21 14:49:41,607,607 WARNING  [parser.py:161] Loop with no data on line: 390
2022-06-21 14:49:41,607,607 WARNING  [parser.py:161] Loop with no data on line: 403
$ tree query_results/
query_results/
├── TAR_Db0515A.000_20100926
│   └── 2
│       ├── acqu
│       ├── acqu2
│       ├── acqu2s
│       ├── acqus
│       ├── audita.txt
│       ├── cpdprg3
│       ├── fid.star
│       ├── format.temp
│       ├── pdata
│       │   └── 1
│       │       ├── proc
│       │       ├── proc2
│       │       ├── proc2s
│       │       ├── procs
│       │       └── title
│       ├── prosol_History
│       ├── pulseprogram
│       ├── scon2
│       ├── ser
│       └── uxnmr.par
└── TAR_Db0515A.000_20210301
    └── 10_Chsqc-800
        ├── acqu
        ├── acqu2
        ├── acqu2s
        ├── acqus
        ├── audita.txt
        ├── cpdprg2
        ├── fid.com
        ├── fid.star
        ├── format.temp
        ├── overflow_log_dru1.txt
        ├── pdata
        │   └── 1
        │       ├── proc
        │       ├── proc2
        │       ├── proc2s
        │       ├── procs
        │       └── title
        ├── proc.com
        ├── pulseprogram
        ├── scon2
        ├── ser
        ├── specpar
        ├── uxnmr.info
        └── uxnmr.par

Note that now NMR-STAR files have been created for those FIDs, fid.star. Also note, that the pdata directories have also been preserved and keep the same directory structure as from their original sources.

Users can also query FIDs from a SpecDB instance by row ids in the summary table. Investigating the help menu for specdb query reveals arguments for providing row ids.

$ specdb query --help
usage: specdb query [-h] [--sql <str> [<str> ...]] [--star] --db <path>
                    [--out <path>] [--indices <str> [<str> ...]]

optional arguments:
  -h, --help            show this help message and exit
  --sql <str> [<str> ...]
                        query using sql syntax. The query can be on any table.
                        If no --output format is given, results are printed to
                        screen.
  --star                whether to write a NMR-STAR file for each FID in the
                        query results
  --db <path>           path to specific specdb database file to use. use
                        `specdb create` to create a new database file
  --out <path>          directory to place results of the query
  --indices <str> [<str> ...]
                        provide a list of row ids in the summary table to
                        collect users can provide a list of ids directly on
                        the command line space separated, or in a .csv file
                        with all ids comma separated on first line

We can use --indices to pull out the same data as before. First run the query tool with no output directory to determine the ids you would want:

$ specdb query --sql "SELECT * FROM summary WHERE pulse_sequence LIKE '%hsqc%'" --db sample.db

  id       experiment_dat pulse_sequence pulse_sequence          md5checksum  session_id session_prepar  \
  12  2010-09-26 22:16...           None   hsqcetf3gpsi  e3c4e84d0ae67e8c...           1            TAR   
  13  2021-03-01 19:01...           None       hsqcetgp  335f5a0ef63cb690...           2            KJF   

  temperature buffer_id       pst_id pst_preparer batch_id batch_preparer sample_sequenc expression_id  \
     293.0000   h7.4n50  Db0515A.000          TAR  Db0515A           None       GPLIEVLA       Db0515A   
     297.9658   h7.4n50  Db0515A.000          TAR  Db0515A           None       GPLIEVLA       Db0515A   

 expression_pre construct_id construct_prep target_id target_prepare target_sequenc project  \
            TAR      Db0515A            KJF    Db0515            GTM      MGPLIEVLA     DBh   
            TAR      Db0515A            KJF    Db0515            GTM      MGPLIEVLA     DBh   

       construct_sequ spectrometer_i    field probe_id              probehd tube_type           zipped_dir  
  MGHHHHHHHSMGPLIEVLA          Hu800  800 MHz     None  1.7_mm_CPTCI_1H-...  NMR tube  b'\x1f\x8b\x08\x...  
  MGHHHHHHHSMGPLIEVLA          Hu800  800 MHz     None  5_mm_CPTCI_1H-13...  NMR tube  b'\x1f\x8b\x08\x...

Perhaps we only want the ser file from the 2010 date, so we can run the following command:

$ specdb query --db sample.db --indices 12 --out only_12 --star
2022-06-21 15:02:16,230,230 WARNING  [parser.py:161] Loop with no data on line: 33
2022-06-21 15:02:16,230,230 WARNING  [parser.py:161] Loop with no data on line: 72
2022-06-21 15:02:16,231,231 WARNING  [parser.py:161] Loop with no data on line: 84
2022-06-21 15:02:16,231,231 WARNING  [parser.py:161] Loop with no data on line: 93
2022-06-21 15:02:16,231,231 WARNING  [parser.py:161] Loop with no data on line: 130
2022-06-21 15:02:16,231,231 WARNING  [parser.py:161] Loop with no data on line: 161
2022-06-21 15:02:16,231,231 WARNING  [parser.py:161] Loop with no data on line: 200
2022-06-21 15:02:16,232,232 WARNING  [parser.py:161] Loop with no data on line: 237
2022-06-21 15:02:16,232,232 WARNING  [parser.py:161] Loop with no data on line: 280
2022-06-21 15:02:16,232,232 WARNING  [parser.py:161] Loop with no data on line: 304
2022-06-21 15:02:16,233,233 WARNING  [parser.py:161] Loop with no data on line: 390
2022-06-21 15:02:16,233,233 WARNING  [parser.py:161] Loop with no data on line: 403
$ tree only_12/
only_12/
└── TAR_Db0515A.000_20100926
    └── 2
        ├── acqu
        ├── acqu2
        ├── acqu2s
        ├── acqus
        ├── audita.txt
        ├── cpdprg3
        ├── fid.star
        ├── format.temp
        ├── pdata
        │   └── 1
        │       ├── proc
        │       ├── proc2
        │       ├── proc2s
        │       ├── procs
        │       └── title
        ├── prosol_History
        ├── pulseprogram
        ├── scon2
        ├── ser
        └── uxnmr.par

The Loop with no data warnings are coming from pynmrstar. If we wanted both records 12 and 13, we just specify them both in the same specdb query statement.

$ specdb query --db sample.db --indices 12 13 --out both/ --star
2022-06-21 15:07:27,920,920 WARNING  [parser.py:161] Loop with no data on line: 33
2022-06-21 15:07:27,921,921 WARNING  [parser.py:161] Loop with no data on line: 72
2022-06-21 15:07:27,921,921 WARNING  [parser.py:161] Loop with no data on line: 84
2022-06-21 15:07:27,921,921 WARNING  [parser.py:161] Loop with no data on line: 93
2022-06-21 15:07:27,921,921 WARNING  [parser.py:161] Loop with no data on line: 130
2022-06-21 15:07:27,922,922 WARNING  [parser.py:161] Loop with no data on line: 161
2022-06-21 15:07:27,922,922 WARNING  [parser.py:161] Loop with no data on line: 200
2022-06-21 15:07:27,922,922 WARNING  [parser.py:161] Loop with no data on line: 237
2022-06-21 15:07:27,922,922 WARNING  [parser.py:161] Loop with no data on line: 280
2022-06-21 15:07:27,922,922 WARNING  [parser.py:161] Loop with no data on line: 304
2022-06-21 15:07:27,923,923 WARNING  [parser.py:161] Loop with no data on line: 390
2022-06-21 15:07:27,923,923 WARNING  [parser.py:161] Loop with no data on line: 403
2022-06-21 15:07:28,350,350 WARNING  [parser.py:161] Loop with no data on line: 33
2022-06-21 15:07:28,351,351 WARNING  [parser.py:161] Loop with no data on line: 72
2022-06-21 15:07:28,351,351 WARNING  [parser.py:161] Loop with no data on line: 84
2022-06-21 15:07:28,351,351 WARNING  [parser.py:161] Loop with no data on line: 93
2022-06-21 15:07:28,351,351 WARNING  [parser.py:161] Loop with no data on line: 130
2022-06-21 15:07:28,352,352 WARNING  [parser.py:161] Loop with no data on line: 161
2022-06-21 15:07:28,352,352 WARNING  [parser.py:161] Loop with no data on line: 200
2022-06-21 15:07:28,352,352 WARNING  [parser.py:161] Loop with no data on line: 237
2022-06-21 15:07:28,352,352 WARNING  [parser.py:161] Loop with no data on line: 280
2022-06-21 15:07:28,353,353 WARNING  [parser.py:161] Loop with no data on line: 304
2022-06-21 15:07:28,353,353 WARNING  [parser.py:161] Loop with no data on line: 390
2022-06-21 15:07:28,353,353 WARNING  [parser.py:161] Loop with no data on line: 403
$ tree both/
both/
├── TAR_Db0515A.000_20100926
│   └── 2
│       ├── acqu
│       ├── acqu2
│       ├── acqu2s
│       ├── acqus
│       ├── audita.txt
│       ├── cpdprg3
│       ├── fid.star
│       ├── format.temp
│       ├── pdata
│       │   └── 1
│       │       ├── proc
│       │       ├── proc2
│       │       ├── proc2s
│       │       ├── procs
│       │       └── title
│       ├── prosol_History
│       ├── pulseprogram
│       ├── scon2
│       ├── ser
│       └── uxnmr.par
└── TAR_Db0515A.000_20210301
    └── 10_Chsqc-800
        ├── acqu
        ├── acqu2
        ├── acqu2s
        ├── acqus
        ├── audita.txt
        ├── cpdprg2
        ├── fid.com
        ├── fid.star
        ├── format.temp
        ├── overflow_log_dru1.txt
        ├── pdata
        │   └── 1
        │       ├── proc
        │       ├── proc2
        │       ├── proc2s
        │       ├── procs
        │       └── title
        ├── proc.com
        ├── pulseprogram
        ├── scon2
        ├── ser
        ├── specpar
        ├── uxnmr.info
        └── uxnmr.par

8. Incremental Backup of a SpecDB database

To perform an incremental backup, you first need to run the specdb backup tool. By default, specdb backup will make a objects/ directory to store the database pages if none is specified, and a backup.txt to store the sha256 hashes of the database. Then to restore the database (i.e backup it into a new SQLite database), you use specdb restore.

$ specdb restore --backup backup.db --shafile backup.txt --objects objects/

After one makes changes to the original sample.db database, using specdb backup will result in modifications to the objects/ directory and backup.txt. Performing specdb restore will only make the incremental changes to become current with the backed up state of sample.db.

SpecDB/TUTORIAL.md