Skip to content
Permalink
master
Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
Go to file
 
 
Cannot retrieve contributors at this time
# SpecDB Tutorial
The purpose of the following tutorial is to provide examples of how to use SpecDB. There are seven levels to the tutorial. _(1)_ SpecDB subcommands and help menus. _(2)_ how to instantiate a database, _(3)_ how to make forms, _(4)_ how to insert forms, _(5)_ viewing what is in the database, _(6)_ updating records in the database, and _(7)_ querying information from the database.
## 1. SpecDB help menus and subcommands
The first entry point to look for guidance on SpecDB functions is to use the help menus. If `specdb --help` results in the help menu for SpecDB, then it is installed correctly. SpecDB has seven sub commands, each listed below and the command line arguments each take:
1. `specdb create --db --backup`
2. `specdb forms --table --num`
3. `specdb insert --form --db --write`
4. `specdb summary --table --db`
5. `specdb query --sql --db --out --star --indices`
6. `specdb backup --db --objects --shafile`
7. `specdb restore --backup --shafile --objects`
The subcommands listed above in the logical order the commands are used in. Each subcommand has a separate help menu from `specdb --help` that can be accessed, (e.g `specdb forms --help`). Users first need to create a SpecDB SQLite database file with `create`. Next, users need to populate the database with information. The `forms` command makes the forms for the data fields needed for the SpecDB schema. With a filled form, users use `insert` to insert the form into their database. To verify/check what they inserted, users can use `summary` to investigate the contents of any SpecDB table. Users can pull data out of the database with `query`. With `query` users provide a SQL SELECT statement on the SpecDB summary view to pull data out of the database. `query` also allows for users to extract records from SpecDB by row-ids in the `summary` table. Commands `backup` and `restore` are for the incremental backup operations.
## 2. Instantiating a new SpecDB database
A SpecDB database can be created with `specdb create`. The location of the database _.db_ file needs to be known by the users using the database, or an environment variable can be set. In this tutorial we are going to create a database for the sample data stored in this repository `./sample/`.
```
$ specdb create --db sample.db --backup backup.db
db configured and initalized
Done
```
## 3. Making forms
Now that a database has been made, we can start adding data to the database. To add data we need to populate a text file form to be read into the database. SpecDB uses YAML to build forms. Examples can be seen in `/sample/sample_forms/`
Running `specdb forms --table user` results in the user form being printed to screen. The output should be exactly as seen below.
```
user: # provide information about a user, REQUIRED: `user_id`
0:
user_id: # no spaces, must be unique, len <= 8, Ex: KJF
given_name: # len <= 64, Ex: Keith
family_name: # len <= 64, Ex: Fraga
middle_initials: # len <= 16, Ex: J
department_and_institution: # len <= 128, Ex: MCB UCD
country: # len <= 32, Ex: USA
state_province: # len <= 32, Ex: NY
city: # len <= 32, Ex: Troy
postal_code: # len <= 32, Ex: 12180
role: # len <= 64, Ex: postdoc
organization_type: # len <= 64, Ex: academic
email_address: # no spaces, must have `@`, len <= 64, Ex: user@mail.com
```
To fill the form, the output of the `specdb forms` command can be redirected to any file. (Note: the file extension is not strict, tested with `txt`, `yaml`, `yml`. )
```
$ specdb forms --table user > user.yml
```
Inspection of the resulting form shows that name of the table this form was created for was created for: `user`. On the same line lists the required items need to be filled into this form for insertion into SpecDB. All items not indicated as required are optional. (Note: optional SpecDB terms may be required for BMRB deposition however).
In the `sample/sample_forms/` directory is a users yaml form pre filled out with some user information as an example. Each line in the form file has a comment describing the restrictions on allowable values for that data item, as a well as an example of what can be used for that respective field. Viewing the `sample/sample_forms/users.yaml` we see there are actually multiple users provided. The `specdb forms` command has the `--num` option for specifying the number of forms to make for a requested table. In the case of `sample/sample_forms/users.yaml` we requested 3. To make the template to do something like `users.yaml` is the following command:
```
$ specdb forms --table user --num 3
user: # provide information about a user, REQUIRED: `user_id`
0:
user_id: # no spaces, must be unique, len <= 8, Ex: KJF
given_name: # len <= 64, Ex: Keith
family_name: # len <= 64, Ex: Fraga
middle_initials: # len <= 16, Ex: J
department_and_institution: # len <= 128, Ex: MCB UCD
country: # len <= 32, Ex: USA
state_province: # len <= 32, Ex: NY
city: # len <= 32, Ex: Troy
postal_code: # len <= 32, Ex: 12180
role: # len <= 64, Ex: postdoc
organization_type: # len <= 64, Ex: academic
email_address: # no spaces, must have `@`, len <= 64, Ex: user@mail.com
1:
user_id: # no spaces, must be unique, len <= 8, Ex: KJF
given_name: # len <= 64, Ex: Keith
family_name: # len <= 64, Ex: Fraga
middle_initials: # len <= 16, Ex: J
department_and_institution: # len <= 128, Ex: MCB UCD
country: # len <= 32, Ex: USA
state_province: # len <= 32, Ex: NY
city: # len <= 32, Ex: Troy
postal_code: # len <= 32, Ex: 12180
role: # len <= 64, Ex: postdoc
organization_type: # len <= 64, Ex: academic
email_address: # no spaces, must have `@`, len <= 64, Ex: user@mail.com
2:
user_id: # no spaces, must be unique, len <= 8, Ex: KJF
given_name: # len <= 64, Ex: Keith
family_name: # len <= 64, Ex: Fraga
middle_initials: # len <= 16, Ex: J
department_and_institution: # len <= 128, Ex: MCB UCD
country: # len <= 32, Ex: USA
state_province: # len <= 32, Ex: NY
city: # len <= 32, Ex: Troy
postal_code: # len <= 32, Ex: 12180
role: # len <= 64, Ex: postdoc
organization_type: # len <= 64, Ex: academic
email_address: # no spaces, must have `@`, len <= 64, Ex: user@mail.com
```
This output can be redirected to any file and filled in the with appropriate information.
Forms can be made to include information for different tables in the same form. A very common use for this is inserting a buffer into SpecDB. There are two tables in the SpecDB schema that describe buffers, `buffer` and `buffer_components`. `buffer` provides the name of the buffer as the `buffer_id`, and the buffer's pH. The `buffer_components` table allows for variable number of components to be added to the buffer. For example, if a buffer with 3 components is needed to be added to SpecDB, the following command would generate the correct form:
```$ specdb forms --table buffer buffer_components --num 1 3
buffer: # store all buffers used, their id and their pH, REQUIRED: `buffer_id`, `buffer_pH`, NOTE: the buffer components go in the `buffer_components` table
0:
buffer_id: # text identifier for the buffer, must be unique, no spaces, len <= 32, Ex: hn4071
buffer_ph: # pH of buffer, must be a number, Ex: 8.1
buffer_comment: # free text field for any notes about the buffer, len <= 128, Ex: made by KJF 12/04
buffer_components: # describe the component(s) of a buffer, REQUIRED: `buffer_id`, `buffer_component`, `buffer_component_value`, `buffer_component_unit`
0:
buffer_id: # `buffer_id` this component goes to, must already be in buffer table, Ex: hn4071
buffer_component: # name of the component, len <= 64, Ex: NaCl
buffer_component_value: # the numeric concentration value of the component, Ex: 100.0
buffer_component_unit: # unit of concentration, one of (`mM`, `% (v/v)`, `mg/mL`), Ex: mM
1:
buffer_id: # `buffer_id` this component goes to, must already be in buffer table, Ex: hn4071
buffer_component: # name of the component, len <= 64, Ex: NaCl
buffer_component_value: # the numeric concentration value of the component, Ex: 100.0
buffer_component_unit: # unit of concentration, one of (`mM`, `% (v/v)`, `mg/mL`), Ex: mM
2:
buffer_id: # `buffer_id` this component goes to, must already be in buffer table, Ex: hn4071
buffer_component: # name of the component, len <= 64, Ex: NaCl
buffer_component_value: # the numeric concentration value of the component, Ex: 100.0
buffer_component_unit: # unit of concentration, one of (`mM`, `% (v/v)`, `mg/mL`), Ex: mM
```
It is important to note that if the `--num` option is provided, that the number of iterations to take match the number of tables requested. In the above case, the `buffer` form was created once because of the `1` after the `--num` and three `buffer_components` were made because of the `3` after the `1` in the `--num` options. The number of options in `--table` and `--num` are in direct, sequential correspondence with each other. If no `--num` options are provided it is assumed that all tables are produced just once.
Inspecting `sample/sample_forms/complete_sample.yaml` will find all the information required to describe a biomolecular NMR sample. It is recommended that users use `specdb forms` to create the forms when they need them because users can define multiple entities at a time, and one general form will not suffice. However, it is instructive to see all the metadata items that are tracked in SpecDB by looking at `complete_sample.yaml`.
To follow along with the sample forms provided in the repository, perform the following commands:
```
$ specdb forms --table user --num 3 > user.yaml
$ specdb forms --table project > project.yaml
$ specdb forms --table buffer buffer_components --num 1 3 > buffer.yaml
$ specdb forms --table target construct expression purification_batch pst batch_components > sample.yaml
$ specdb forms --table spectrometer > spectrometer.yaml
$ specdb forms --table session > session.yaml
```
## 4. Inserting forms
In this section, users can practice with the pre-filled forms in `sample/sample_forms/`. To first insert users in a group, the following insert command will insert the user information provided in `user.yaml`. We will be working off the sample forms in the RPISpecDB repository. To follow-along, just provide the path to where the repository is installed locally.
```
$ specdb insert --db sample.db --form RPISpecDB/sample/sample_forms/users.yaml
information provided is new
{
"user_id": "KJF",
"given_name": "Keith",
"family_name": "Fraga",
"middle_initials": "J",
"department_and_institution": "UC MCB",
"country": null,
"state_province": null,
"city": null,
"postal_code": null,
"role": null,
"organization_type": null,
"email_address": "kjfraga@ucdavis.edu"
}
Must set --write to insert provided information to database
Aborting
```
`sample.db` is the database created in section (2) of the tutorial. Running the above commnd results in a _Aborting_ error. This means that SpecDB attempted the insert, but found that the requested information was new and could not complete the insertion. This check is in place to ensure users truly want to insert the requested information. Sometimes the provided information may be new, but new and unique because of a typo and the actual information being inserted is already in the database. To overwrite this check, the `--write` flag must be used. Essentially the `--write` flag is meant to check with the user if they really want to insert the provided information to the database.
```
$ specdb insert --db sample.db --form RPISpecDB/sample/sample_forms/users.yaml --write
Inserted data from json file at RPISpecDB/sample/sample_forms/users.yaml
```
The last line indicates the data was inserted. Running the `specdb insert` command on the `buffers.yaml` with the `--write` flag results in the following.
```
$ specdb insert --db sample.db --form RPISpecDB/sample/sample_forms/buffers.yaml --write
Inserted data from json file at RPISpecDB/sample/sample_forms/buffers.yaml
```
Now that the buffer is inserted, we can attempt to insert our information about a sample. To learn more about the relationships and definitions of each of the tables in the SpecDB manuscript refer to the manuscript. Each table, from `target` to `pst` needs to be inserted to fully describe a sample. However, typically for each table there is a subset of required data items, so not every data item is required for insertion.
```
$ specdb insert --db sample.db --form RPISpecDB/sample/sample_forms/sample.yaml --write
SQLite error on insert on table target
Err Message:
FOREIGN KEY constraint failed
check the template form for instructions and examples
ensure all ids this table relates to (i.e constructs relate to targets) are inserter already
Aborting
```
This insertion results in a different error from before. Here, the error was not because the information was new because the `--write` was on. The error is coming from SQLite, that a foreign key constraint failed. A foreign key is a relationship between two tables. These foreign keys need to be present in the two tables for SpecDB to allow an insert. The error message is not clear on what column the constraint failure came on, but it says the error came from the `target` table and to inspect the form for what is required for a target. Using `forms` we can see what is required for `target`.
```
$ specdb forms --table target
target: # molecular target information, REQUIRED: `target_id`
0:
target_id: # text identifier for target, must be unique, no spaces, len <= 32, Ex: Db0515
target_comment: # free field comment, anything to note about target, len <= 128, Ex: from UW-Madison
target_sequence: # target's molecular seq., any type of code/alphabet, no spaces, len <= 1024, Ex: MGSHHHHILVAM
organism_source: # organism name for target source, can indicate if target is synthetic, len <= 128, Ex: synthetic
gene_name: # gene name, len <= 64, Ex: SpikeCoV2
project_id: # `project_id` target is a member of, must be in project table, Ex: SpikeFraga
target_preparer: # `user_id` that assigned the target, must be in user table, Ex: KJF
```
There are relationships between the `user` table and the `project` table. Inspecting what we tried to add for the target in `sample.yaml` we see:
```
target: # molecular target information, REQUIRED: `target_id`
0:
target_id: Db0515 # text identifier for target, must be unique, no spaces, len <= 32, Ex: Db0515
target_comment: # free field comment, anything to note about target, len <= 128, Ex: from UW-Madison
target_sequence: MGPLIEVLA # target's molecular seq., any type of code/alphabet, no spaces, len <= 1024, Ex: MGSHHHHILVAM
organism_source: synthetic # organism name for target source, can indicate if target is synthetic, len <= 128, Ex: synthetic
gene_name: # gene name, len <= 64, Ex: SpikeCoV2
project_id: DBh # `project_id` target is a member of, must be in project table, Ex: SpikeFraga
target_preparer: GTM # `user_id` that assigned the target, must be in user table, Ex: KJF
```
We inserted the user information for `GTM` but not the project id `DBh` this target is associated with. We can solve this by inserting the project information using the `sample/sample_forms/project.yaml` form.
```
$ specdb insert --db sample.db --form RPISpecDB/sample/sample_forms/project.yaml --write
Inserted data from json file at RPISpecDB/sample/sample_forms/project.yaml
```
Then re-doing the sample insert we get:
```
specdb insert --db sample.db --form RPISpecDB/sample/sample_forms/sample.yaml --write
Inserted data from json file at RPISpecDB/sample/sample_forms/sample.yaml
```
To insert FIDs into SpecDB, we first need to insert the spectrometer information.
```
$ specdb insert --db sample.db --form RPISpecDB/sample/sample_forms/spectrometer.yaml --write
Inserted data from json file at RPISpecDB/sample/sample_forms/spectrometer.yaml
```
Inserting FIDs into SpecDB are typically organized into sessions. When FIDs are recorded from a Bruker instrument, they are recorded into sub-directories of a data collection session directory. We mimicked this situation in `sample/sample_sessions/`. In `example1/`, there are two subdirectories, each with a time domain dataset. There is also a `specdb.session.yaml`. This file can be creatd with `specdb forms --table session`. The session yaml describe the spectrometer this data collection was done on, the user who conducted the session, and the sample tube the data was collected of. Contents of this session is displayed below:
```
session: # describe a data collection session, REQUIRED: `session_preparer`, `spectrometer_id`, `pst_id`
0:
session_preparer: KJF # `user_id` of the person who prepared and collected the session, must already be in user table, Ex: KJF
spectrometer_id: Hu800 # `spectrometer_id` session was collected at, must already be in spectrometer table, Ex: Hu800
pst_id: Db0515A.000 # `pst_id` the session is for
```
All three items in this session are required.
Performing the insert on this session yaml file results in:
```
$ specdb insert --db sample.db --form RPISpecDB/sample/sample_sessions/example1/specdb.session.yaml --write
a ./1/uxnmr.par
a ./1/prosol_History
a ./1/pulseprogram
a ./1/format.temp
a ./1/audita.txt
a ./1/pdata/1/proc
a ./1/pdata/1/title
a ./1/pdata/1/procs
a ./1/fid
a ./1/scon2
a ./1/acqu
a ./1/acqus
a ./2/uxnmr.par
a ./2/prosol_History
a ./2/pulseprogram
a ./2/acqu2s
a ./2/format.temp
a ./2/audita.txt
a ./2/acqu2
a ./2/pdata/1/proc
a ./2/pdata/1/proc2
a ./2/pdata/1/title
a ./2/pdata/1/proc2s
a ./2/pdata/1/procs
a ./2/cpdprg3
a ./2/ser
a ./2/scon2
a ./2/acqu
a ./2/acqus
Inserted data from form file at RPISpecDB/sample/sample_sessions/example1/specdb.session.yaml
```
SpecDB is finding what sub directories that have time domain data, finding the parameter files, and inserting the time domain data into the database. The lines that get printed with `a ./*` reflect the files that SpecDB is finding and placing into a tar archive to ultimately be inserted into SpecDB. The session yaml file changes, reflecting all the information the `specdb insert` found:
```
session: # describe a data collection session, REQUIRED: pst_id (if not none, just use 'NONE')
0:
session_preparer: TAR # `user_id` of the person who prepared and collected the session, must already be in user table, Ex: KJF
spectrometer_id: Hu800 # `spectrometer_id` session was collected at, must already be in spectrometer table, Ex: Hu800
pst_id: Db0515A.000 # `pst_id` the session is for
time_domain_dataset: # record information about a time domain dataset, REQUIRED: `subdir_name`
0:
subdir_name: '1' # path to where fid/ser file is, can be a relative path, len <= 128, Ex: 1/
pst_id: Db0515A.000 # `pst_id` of sample, must already be in pst table, Ex: Db0515.001
pulse_sequence_nickname: # nickname for pulse sequence, controlled vocabulary
probe_id: # `probe_id` for the probe, if blank probe information pulled from fid directory
1:
subdir_name: '2' # path to where fid/ser file is, can be a relative path, len <= 128, Ex: 1/
pst_id: Db0515A.000 # `pst_id` of sample, must already be in pst table, Ex: Db0515.001
pulse_sequence_nickname: # nickname for pulse sequence, controlled vocabulary
probe_id: # `probe_id` for the probe, if blank probe information pulled from fid directory
```
Other points to make note of, SpecDB does search for processing scripts and processing parameter files in a `pdata/` directories. However, no processed data is archived in a SpecDB instance. The sample session `example1/` did not have a `nmrpipe` processing scripts. The session `example3` does. Inserting `example3` looks like:
```
$ specdb insert --db sample.db --form RPISpecDB/sample/sample_sessions/example3/specdb.yaml --write
a ./10_Chsqc-800/uxnmr.par
a ./10_Chsqc-800/pulseprogram
a ./10_Chsqc-800/.DS_Store
a ./10_Chsqc-800/acqu2s
a ./10_Chsqc-800/proc.com
a ./10_Chsqc-800/format.temp
a ./10_Chsqc-800/audita.txt
a ./10_Chsqc-800/acqu2
a ./10_Chsqc-800/pdata/1/proc
a ./10_Chsqc-800/pdata/1/proc2
a ./10_Chsqc-800/pdata/1/title
a ./10_Chsqc-800/pdata/1/proc2s
a ./10_Chsqc-800/pdata/1/procs
a ./10_Chsqc-800/cpdprg2
a ./10_Chsqc-800/overflow_log_dru1.txt
a ./10_Chsqc-800/uxnmr.info
a ./10_Chsqc-800/ser
a ./10_Chsqc-800/fid.com
a ./10_Chsqc-800/scon2
a ./10_Chsqc-800/acqu
a ./10_Chsqc-800/acqus
a ./10_Chsqc-800/specpar
Inserted data from form file at /Users/kfraga/RESEARCH/RPISpecDB/sample/sample_sessions/example3/specdb.yaml
```
The `*.com` scripts are `nmrpipe` scripts that were used to process that specific FID, which are saved separated in SpecDB.
SpecDB also allows users to insert default processing scripts for pulse sequences used in their research group. To insert a default processing script, you can use `specdb forms` to inspect what is required.
```
$ specdb forms --table default_processing_scripts
default_processing_scripts: # record a default nmrpipe processing script for every pulse sequence
0:
pulse_sequence_name: # name of pulse sequence, from Bruker it is the value of the PULPROG tag
nickname: # user provided nickname, references the pulse_sequence_nicknames table
default_processing: # path to text file of a nmrpipe processing script that read a FID for this pulse sequence type
```
You can use `specdb summary` view the possible pulse sequence nicknames to use:
```
$ specdb summary ---specdb summary --db sample.db --table pulse_sequence_names
id nickname
55 13C-(1H) hetNOE
54 15N-(1H) hetNOE
80 15N-1H solid-sta...
2 1D 13C
3 1D 15N
66 1D 15N CPMG
57 1D 15N T1
63 1D 15N T1rho
60 1D 15N T2
1 1D 1H
4 1D 31P
90 1H-detected (H)N...
61 2D 15N T2
18 2D (HB)CB(CGCD)HD
71 2D 13C-13C DARR
88 2D 13C-detected ...
89 2D 13C-detected NCA
84 2D 13C/13C PDSD
85 2D 13C/13C TOBSY
67 2D 15N CPMG
58 2D 15N T1
64 2D 15N T1rho
82 2D 15N-13C TEDOR
5 2D 1H-13C HMQC
9 2D 1H-13C HSQC
11 2D 1H-13C HSQC a...
12 2D 1H-13C HSQC a...
10 2D 1H-13C HSQC/HMQC
6 2D 1H-15N HSQC
7 2D 1H-15N HSQC N...
8 2D 1H-15N TROSY
16 2D 1H-1H COSY
17 2D 1H-1H NOESY
14 2D 1H-1H ROESY
13 2D 1H-1H TOCSY
70 2D DARR
15 2D DQF-COSY
87 2D HBCBCGCD
86 2D HBCBCGCDCEHE
36 3D (C)CCH-TOCSY
38 3D 13C-separated...
68 3D 15N CPMG
59 3D 15N T1
65 3D 15N T1rho
62 3D 15N T2
56 3D 15N,13C sim N...
37 3D 15N-separated...
41 3D 1H-13C NOESY
42 3D 1H-13C NOESY ...
43 3D 1H-13C NOESY ...
39 3D 1H-15N NOESY
40 3D 1H-15N TOCSY
23 3D C(CO)NH
21 3D CBCACONH
22 3D CBCANH
33 3D CCH-TOCSY
34 3D H(CCO)NH
30 3D HBHA(CO)NH
26 3D HBHANH
47 3D HCACO
46 3D HCCH-COSY
35 3D HCCH-TOCSY
28 3D HN(CA)CO
31 3D HN(CO)CA
32 3D HN(CO)CACB
48 3D HN(COCA)CB
25 3D HNCA
27 3D HNCACB
29 3D HNCACO
24 3D HNCO
44 3D HNHA
45 3D HNHB
77 3D NCACX
76 3D NCOCX
91 3D RNCSA
92 3D ROCSA
50 4D 13C-separated...
49 4D CC NOESY
51 5D APSY CBCACONH
52 5D APSY HACACONH
53 5D HN(CA)CONH
72 CANcoCX
83 CN TEDOR
75 CONCA
69 DARR
81 FS-REDOR
19 JR NOESY
20 L-PROSY NOESY
73 NCACX
74 NCOCX
78 PAIN
79 PAR
```
Section 5 below discusses more how to use `specdb summary`. To submit a default processing script then you can make `default_processing_script` yaml file and use `specdb insert` to insert it. The `default_processing` tag is expecting a filesystem path to where a text file for processing script wants to submit. There is an example default processing script yaml in the repo at `sample/sample_forms/`
```
$ $ specdb insert --db sample.db --form RPISpecDB/sample/sample_forms/default.processing.yaml --write
Inserted data from form file at RPISpecDB/sample/sample_forms/default.processing.yaml
```
## 5. View what is in the database
Each time we inserted a form into the database, we did not check that the provided information actually is stored in the database. If the `Inserted into` message comes up after the `insert` then no errors in the insert arrived. We can view a summary of the current entries in a table in the database by using the `specdb summary` command. With `summary` we need to provide the name of the table we want to summarize and in what database file we want to look into.
```
$ specdb summary --db sample.db --table user
id user_id given_name family_name middle_initial department_and country state_province city postal_code \
3 GTM Gaetano Montelione None None None None None None
2 KJF Keith Fraga J RPI None None None None
1 NONE None None None None None None None None
4 TAR Theresa Ramelot A None None None None None
role organization_t email_address
None None monteg3@rpi.edu
postdoc None kjfraga@ucdavis.edu
None None None
None None ramelt2@rpi.edu
```
A simple table is printed to screen of the current items in the database. Try it on any other table.
## 6. Update a record in SpecDB
It is often the case that information already in the database needs to be amended or added to. Suppose we want to update a user's information. The strategy is to pull out the information already in the database into a yaml form, edit the form, write it back to the database with the write flag. SpecDB detects when in a form only a table id is provided, like `user_id`, and the rest of the items are blank. In that case, SpecDB understands that what is being requested is the rest of information for that user. We are going to use this functionality to update a record.
First, make an empty user form.
```
$ specdb forms --table user > utest.yml
```
Add only the id of the user we want to edit. In this case we are only adding `KJF` to the `user_id` tag.
```
$ cat utest.yml
user: # provide information about a user, REQUIRED: `user_id`
0:
user_id: KJF # no spaces, must be unique, len <= 8, Ex: KJF
given_name: # len <= 64, Ex: Keith
family_name: # len <= 64, Ex: Fraga
middle_initials: # len <= 16, Ex: J
department_and_institution: # len <= 128, Ex: MCB UCD
country: # len <= 32, Ex: USA
state_province: # len <= 32, Ex: NY
city: # len <= 32, Ex: Troy
postal_code: # len <= 32, Ex: 12180
role: # len <= 64, Ex: postdoc
organization_type: # len <= 64, Ex: academic
email_address: # no spaces, must have `@`, len <= 64, Ex: user@mail.com
```
Using `specdb insert` to pull the information for user `KJF` out from the database.
```
$ specdb insert --db sample.db --form utest.yml
Inserted data from form file at utest.yml
$ cat utest.yml
user: # provide information about a user, REQUIRED: `user_id`
0:
user_id: KJF # no spaces, must be unique, len <= 8, Ex: KJF
given_name: Keith # len <= 64, Ex: Keith
family_name: Fraga # len <= 64, Ex: Fraga
middle_initials: J # len <= 16, Ex: J
department_and_institution: RPI # len <= 128, Ex: MCB UCD
country: # len <= 32, Ex: USA
state_province: # len <= 32, Ex: NY
city: # len <= 32, Ex: Troy
postal_code: # len <= 32, Ex: 12180
role: postdoc # len <= 64, Ex: postdoc
organization_type: # len <= 64, Ex: academic
email_address: kjfraga@ucdavis.edu # no spaces, must have `@`, len <= 64, Ex: user@mail.com
```
As you can see, the other tags defined for user `KJF` have been pulled in. We can then edit `utest.yml` to change the first name from Keith to Kevin. (Any example would suffice). Re-insert `utest.yml`. First without `--write`, then with.
```
$ specdb insert --db sample.db --form utest.yml
requested data is different than the database
input information:
{
"user_id": "KJF",
"given_name": "Kevin",
"family_name": "Fraga",
"middle_initials": "J",
"department_and_institution": "RPI",
"country": null,
"state_province": null,
"city": null,
"postal_code": null,
"role": "postdoc",
"organization_type": null,
"email_address": "kjfraga@ucdavis.edu"
}
database information:
{
"user_id": "KJF",
"given_name": "Keith",
"family_name": "Fraga",
"middle_initials": "J",
"department_and_institution": "RPI",
"country": null,
"state_province": null,
"city": null,
"postal_code": null,
"role": "postdoc",
"organization_type": null,
"email_address": "kjfraga@ucdavis.edu"
}
Aborting
$ specdb insert --db sample.db --form utest.yml --write
Inserted data from form file at utest.yml
```
Using `summary`, we can see the change.
```
$ specdb summary --db sample.db --table user
id user_id given_name family_name middle_initial department_and country state_province city postal_code \
3 GTM Gaetano Montelione None None None None None None
5 KJF Kevin Fraga J RPI None None None None
1 NONE None None None None None None None None
4 TAR Theresa Ramelot A None None None None None
role organization_t email_address
None None monteg3@rpi.edu
postdoc None kjfraga@ucdavis.edu
None None None
None None ramelt2@rpi.edu
```
## 7. Query information from SpecDB
To query information from SpecDB, some FID data must be inserted first. The `specdb query` command only accepts queries against the summary view in the database (not to be confused with `specdb summary`). To view the contents of the summary view, we can use `specdb summary`.
```
$ specdb summary --db sample.db --table summary
id experiment_dat pulse_sequence pulse_sequence md5checksum session_id session_prepar \
11 2010-09-26 21:33... None zgpr f1d80ff78a50d207... 1 TAR
12 2010-09-26 22:16... None hsqcetf3gpsi e3c4e84d0ae67e8c... 1 TAR
13 2021-03-01 19:01... None hsqcetgp 335f5a0ef63cb690... 2 KJF
temperature buffer_id pst_id pst_preparer batch_id batch_preparer sample_sequenc expression_id \
293.0000 h7.4n50 Db0515A.000 TAR Db0515A None GPLIEVLA Db0515A
293.0000 h7.4n50 Db0515A.000 TAR Db0515A None GPLIEVLA Db0515A
297.9658 h7.4n50 Db0515A.000 TAR Db0515A None GPLIEVLA Db0515A
expression_pre construct_id construct_prep target_id target_prepare target_sequenc project \
TAR Db0515A KJF Db0515 GTM MGPLIEVLA DBh
TAR Db0515A KJF Db0515 GTM MGPLIEVLA DBh
TAR Db0515A KJF Db0515 GTM MGPLIEVLA DBh
construct_sequ spectrometer_i field probe_id probehd tube_type zipped_dir
MGHHHHHHHSMGPLIEVLA Hu800 800 MHz None 1.7_mm_CPTCI_1H-... NMR tube b'\x1f\x8b\x08\x...
MGHHHHHHHSMGPLIEVLA Hu800 800 MHz None 1.7_mm_CPTCI_1H-... NMR tube b'\x1f\x8b\x08\x...
MGHHHHHHHSMGPLIEVLA Hu800 800 MHz None 5_mm_CPTCI_1H-13... NMR tube b'\x1f\x8b\x08\x...
```
In this example we are going to query for FIDs produced from an HSQC. This can be accomplished with the following query:
```
$ specdb query --sql "SELECT * FROM summary WHERE pulse_sequence LIKE '%hsqc%'" --db sample.db
id experiment_dat pulse_sequence pulse_sequence md5checksum session_id session_prepar \
12 2010-09-26 22:16... None hsqcetf3gpsi e3c4e84d0ae67e8c... 1 TAR
13 2021-03-01 19:01... None hsqcetgp 335f5a0ef63cb690... 2 KJF
temperature buffer_id pst_id pst_preparer batch_id batch_preparer sample_sequenc expression_id \
293.0000 h7.4n50 Db0515A.000 TAR Db0515A None GPLIEVLA Db0515A
297.9658 h7.4n50 Db0515A.000 TAR Db0515A None GPLIEVLA Db0515A
expression_pre construct_id construct_prep target_id target_prepare target_sequenc project \
TAR Db0515A KJF Db0515 GTM MGPLIEVLA DBh
TAR Db0515A KJF Db0515 GTM MGPLIEVLA DBh
construct_sequ spectrometer_i field probe_id probehd tube_type zipped_dir
MGHHHHHHHSMGPLIEVLA Hu800 800 MHz None 1.7_mm_CPTCI_1H-... NMR tube b'\x1f\x8b\x08\x...
MGHHHHHHHSMGPLIEVLA Hu800 800 MHz None 5_mm_CPTCI_1H-13... NMR tube b'\x1f\x8b\x08\x...
```
Without specifying an output directory, `specdb query` produces a table of records that were found in the query. This helps users dissect and refine their query to the records they ultimately want. Specifying an output directory results in those the FIDs being copied into the directory the user specifies.
```
$ specdb query --sql "SELECT * FROM summary WHERE pulse_sequence LIKE '%hsqc%'" --db sample.db --out query_results
$ tree query_results/
query_results/
├── TAR_Db0515A.000_20100926
│   └── 2
│   ├── acqu
│   ├── acqu2
│   ├── acqu2s
│   ├── acqus
│   ├── audita.txt
│   ├── cpdprg3
│   ├── format.temp
│   ├── pdata
│   │   └── 1
│   │   ├── proc
│   │   ├── proc2
│   │   ├── proc2s
│   │   ├── procs
│   │   └── title
│   ├── prosol_History
│   ├── pulseprogram
│   ├── scon2
│   ├── ser
│   └── uxnmr.par
└── TAR_Db0515A.000_20210301
└── 10_Chsqc-800
├── acqu
├── acqu2
├── acqu2s
├── acqus
├── audita.txt
├── cpdprg2
├── fid.com
├── format.temp
├── overflow_log_dru1.txt
├── pdata
│   └── 1
│   ├── proc
│   ├── proc2
│   ├── proc2s
│   ├── procs
│   └── title
├── proc.com
├── pulseprogram
├── scon2
├── ser
├── specpar
├── uxnmr.info
└── uxnmr.par
```
The above command made a directory `query_results/` and places the query results in this folder. The naming of the subdirectories underneath `results` goes `{user_id}_{pst_id}_{date}`. Then underneath this results subdirectory is the FID that was found as a result of the query. Note that the whole directory the `ser` file was in was also saved, keeping the auxillary files that were made in during the `ser` file collection.
To produce a NMR-STAR file for this result, just use the `--star` command to produce a STAR file in the directory where the `ser` file is.
```
$ specdb query --sql "SELECT * FROM summary WHERE pulse_sequence LIKE '%hsqc%'" --db sample.db --out query_results --star
2022-06-21 14:49:41,207,207 WARNING [parser.py:161] Loop with no data on line: 33
2022-06-21 14:49:41,207,207 WARNING [parser.py:161] Loop with no data on line: 72
2022-06-21 14:49:41,208,208 WARNING [parser.py:161] Loop with no data on line: 84
2022-06-21 14:49:41,208,208 WARNING [parser.py:161] Loop with no data on line: 93
2022-06-21 14:49:41,208,208 WARNING [parser.py:161] Loop with no data on line: 130
2022-06-21 14:49:41,208,208 WARNING [parser.py:161] Loop with no data on line: 161
2022-06-21 14:49:41,208,208 WARNING [parser.py:161] Loop with no data on line: 200
2022-06-21 14:49:41,209,209 WARNING [parser.py:161] Loop with no data on line: 237
2022-06-21 14:49:41,209,209 WARNING [parser.py:161] Loop with no data on line: 280
2022-06-21 14:49:41,209,209 WARNING [parser.py:161] Loop with no data on line: 304
2022-06-21 14:49:41,209,209 WARNING [parser.py:161] Loop with no data on line: 390
2022-06-21 14:49:41,210,210 WARNING [parser.py:161] Loop with no data on line: 403
2022-06-21 14:49:41,604,604 WARNING [parser.py:161] Loop with no data on line: 33
2022-06-21 14:49:41,605,605 WARNING [parser.py:161] Loop with no data on line: 72
2022-06-21 14:49:41,605,605 WARNING [parser.py:161] Loop with no data on line: 84
2022-06-21 14:49:41,605,605 WARNING [parser.py:161] Loop with no data on line: 93
2022-06-21 14:49:41,605,605 WARNING [parser.py:161] Loop with no data on line: 130
2022-06-21 14:49:41,605,605 WARNING [parser.py:161] Loop with no data on line: 161
2022-06-21 14:49:41,606,606 WARNING [parser.py:161] Loop with no data on line: 200
2022-06-21 14:49:41,606,606 WARNING [parser.py:161] Loop with no data on line: 237
2022-06-21 14:49:41,606,606 WARNING [parser.py:161] Loop with no data on line: 280
2022-06-21 14:49:41,606,606 WARNING [parser.py:161] Loop with no data on line: 304
2022-06-21 14:49:41,607,607 WARNING [parser.py:161] Loop with no data on line: 390
2022-06-21 14:49:41,607,607 WARNING [parser.py:161] Loop with no data on line: 403
$ tree query_results/
query_results/
├── TAR_Db0515A.000_20100926
│   └── 2
│   ├── acqu
│   ├── acqu2
│   ├── acqu2s
│   ├── acqus
│   ├── audita.txt
│   ├── cpdprg3
│   ├── fid.star
│   ├── format.temp
│   ├── pdata
│   │   └── 1
│   │   ├── proc
│   │   ├── proc2
│   │   ├── proc2s
│   │   ├── procs
│   │   └── title
│   ├── prosol_History
│   ├── pulseprogram
│   ├── scon2
│   ├── ser
│   └── uxnmr.par
└── TAR_Db0515A.000_20210301
└── 10_Chsqc-800
├── acqu
├── acqu2
├── acqu2s
├── acqus
├── audita.txt
├── cpdprg2
├── fid.com
├── fid.star
├── format.temp
├── overflow_log_dru1.txt
├── pdata
│   └── 1
│   ├── proc
│   ├── proc2
│   ├── proc2s
│   ├── procs
│   └── title
├── proc.com
├── pulseprogram
├── scon2
├── ser
├── specpar
├── uxnmr.info
└── uxnmr.par
```
Note that now NMR-STAR files have been created for those FIDs, `fid.star`. Also note, that the `pdata` directories have also been preserved and keep the same directory structure as from their original sources.
Users can also query FIDs from a SpecDB instance by row ids in the summary table. Investigating the help menu for `specdb query` reveals arguments for providing row ids.
```
$ specdb query --help
usage: specdb query [-h] [--sql <str> [<str> ...]] [--star] --db <path>
[--out <path>] [--indices <str> [<str> ...]]
optional arguments:
-h, --help show this help message and exit
--sql <str> [<str> ...]
query using sql syntax. The query can be on any table.
If no --output format is given, results are printed to
screen.
--star whether to write a NMR-STAR file for each FID in the
query results
--db <path> path to specific specdb database file to use. use
`specdb create` to create a new database file
--out <path> directory to place results of the query
--indices <str> [<str> ...]
provide a list of row ids in the summary table to
collect users can provide a list of ids directly on
the command line space separated, or in a .csv file
with all ids comma separated on first line
```
We can use `--indices` to pull out the same data as before. First run the query tool with no output directory to determine the ids you would want:
```
$ specdb query --sql "SELECT * FROM summary WHERE pulse_sequence LIKE '%hsqc%'" --db sample.db
id experiment_dat pulse_sequence pulse_sequence md5checksum session_id session_prepar \
12 2010-09-26 22:16... None hsqcetf3gpsi e3c4e84d0ae67e8c... 1 TAR
13 2021-03-01 19:01... None hsqcetgp 335f5a0ef63cb690... 2 KJF
temperature buffer_id pst_id pst_preparer batch_id batch_preparer sample_sequenc expression_id \
293.0000 h7.4n50 Db0515A.000 TAR Db0515A None GPLIEVLA Db0515A
297.9658 h7.4n50 Db0515A.000 TAR Db0515A None GPLIEVLA Db0515A
expression_pre construct_id construct_prep target_id target_prepare target_sequenc project \
TAR Db0515A KJF Db0515 GTM MGPLIEVLA DBh
TAR Db0515A KJF Db0515 GTM MGPLIEVLA DBh
construct_sequ spectrometer_i field probe_id probehd tube_type zipped_dir
MGHHHHHHHSMGPLIEVLA Hu800 800 MHz None 1.7_mm_CPTCI_1H-... NMR tube b'\x1f\x8b\x08\x...
MGHHHHHHHSMGPLIEVLA Hu800 800 MHz None 5_mm_CPTCI_1H-13... NMR tube b'\x1f\x8b\x08\x...
```
Perhaps we only want the `ser` file from the 2010 date, so we can run the following command:
```
$ specdb query --db sample.db --indices 12 --out only_12 --star
2022-06-21 15:02:16,230,230 WARNING [parser.py:161] Loop with no data on line: 33
2022-06-21 15:02:16,230,230 WARNING [parser.py:161] Loop with no data on line: 72
2022-06-21 15:02:16,231,231 WARNING [parser.py:161] Loop with no data on line: 84
2022-06-21 15:02:16,231,231 WARNING [parser.py:161] Loop with no data on line: 93
2022-06-21 15:02:16,231,231 WARNING [parser.py:161] Loop with no data on line: 130
2022-06-21 15:02:16,231,231 WARNING [parser.py:161] Loop with no data on line: 161
2022-06-21 15:02:16,231,231 WARNING [parser.py:161] Loop with no data on line: 200
2022-06-21 15:02:16,232,232 WARNING [parser.py:161] Loop with no data on line: 237
2022-06-21 15:02:16,232,232 WARNING [parser.py:161] Loop with no data on line: 280
2022-06-21 15:02:16,232,232 WARNING [parser.py:161] Loop with no data on line: 304
2022-06-21 15:02:16,233,233 WARNING [parser.py:161] Loop with no data on line: 390
2022-06-21 15:02:16,233,233 WARNING [parser.py:161] Loop with no data on line: 403
$ tree only_12/
only_12/
└── TAR_Db0515A.000_20100926
└── 2
├── acqu
├── acqu2
├── acqu2s
├── acqus
├── audita.txt
├── cpdprg3
├── fid.star
├── format.temp
├── pdata
│   └── 1
│   ├── proc
│   ├── proc2
│   ├── proc2s
│   ├── procs
│   └── title
├── prosol_History
├── pulseprogram
├── scon2
├── ser
└── uxnmr.par
```
The `Loop with no data` warnings are coming from `pynmrstar`. If we wanted both records `12` and `13`, we just specify them both in the same `specdb query` statement.
```
$ specdb query --db sample.db --indices 12 13 --out both/ --star
2022-06-21 15:07:27,920,920 WARNING [parser.py:161] Loop with no data on line: 33
2022-06-21 15:07:27,921,921 WARNING [parser.py:161] Loop with no data on line: 72
2022-06-21 15:07:27,921,921 WARNING [parser.py:161] Loop with no data on line: 84
2022-06-21 15:07:27,921,921 WARNING [parser.py:161] Loop with no data on line: 93
2022-06-21 15:07:27,921,921 WARNING [parser.py:161] Loop with no data on line: 130
2022-06-21 15:07:27,922,922 WARNING [parser.py:161] Loop with no data on line: 161
2022-06-21 15:07:27,922,922 WARNING [parser.py:161] Loop with no data on line: 200
2022-06-21 15:07:27,922,922 WARNING [parser.py:161] Loop with no data on line: 237
2022-06-21 15:07:27,922,922 WARNING [parser.py:161] Loop with no data on line: 280
2022-06-21 15:07:27,922,922 WARNING [parser.py:161] Loop with no data on line: 304
2022-06-21 15:07:27,923,923 WARNING [parser.py:161] Loop with no data on line: 390
2022-06-21 15:07:27,923,923 WARNING [parser.py:161] Loop with no data on line: 403
2022-06-21 15:07:28,350,350 WARNING [parser.py:161] Loop with no data on line: 33
2022-06-21 15:07:28,351,351 WARNING [parser.py:161] Loop with no data on line: 72
2022-06-21 15:07:28,351,351 WARNING [parser.py:161] Loop with no data on line: 84
2022-06-21 15:07:28,351,351 WARNING [parser.py:161] Loop with no data on line: 93
2022-06-21 15:07:28,351,351 WARNING [parser.py:161] Loop with no data on line: 130
2022-06-21 15:07:28,352,352 WARNING [parser.py:161] Loop with no data on line: 161
2022-06-21 15:07:28,352,352 WARNING [parser.py:161] Loop with no data on line: 200
2022-06-21 15:07:28,352,352 WARNING [parser.py:161] Loop with no data on line: 237
2022-06-21 15:07:28,352,352 WARNING [parser.py:161] Loop with no data on line: 280
2022-06-21 15:07:28,353,353 WARNING [parser.py:161] Loop with no data on line: 304
2022-06-21 15:07:28,353,353 WARNING [parser.py:161] Loop with no data on line: 390
2022-06-21 15:07:28,353,353 WARNING [parser.py:161] Loop with no data on line: 403
$ tree both/
both/
├── TAR_Db0515A.000_20100926
│   └── 2
│   ├── acqu
│   ├── acqu2
│   ├── acqu2s
│   ├── acqus
│   ├── audita.txt
│   ├── cpdprg3
│   ├── fid.star
│   ├── format.temp
│   ├── pdata
│   │   └── 1
│   │   ├── proc
│   │   ├── proc2
│   │   ├── proc2s
│   │   ├── procs
│   │   └── title
│   ├── prosol_History
│   ├── pulseprogram
│   ├── scon2
│   ├── ser
│   └── uxnmr.par
└── TAR_Db0515A.000_20210301
└── 10_Chsqc-800
├── acqu
├── acqu2
├── acqu2s
├── acqus
├── audita.txt
├── cpdprg2
├── fid.com
├── fid.star
├── format.temp
├── overflow_log_dru1.txt
├── pdata
│   └── 1
│   ├── proc
│   ├── proc2
│   ├── proc2s
│   ├── procs
│   └── title
├── proc.com
├── pulseprogram
├── scon2
├── ser
├── specpar
├── uxnmr.info
└── uxnmr.par
```
## 8. Incremental Backup of a SpecDB database
To perform an incremental backup, you first need to run the `specdb backup` tool. By default, `specdb backup` will make a `objects/` directory to store the database pages if none is specified, and a `backup.txt` to store the sha256 hashes of the database. Then to restore the database (i.e backup it into a new SQLite database), you use `specdb restore`.
```
$ specdb restore --backup backup.db --shafile backup.txt --objects objects/
```
After one makes changes to the original `sample.db` database, using `specdb backup` will result in modifications to the `objects/` directory and `backup.txt`. Performing `specdb restore` will only make the incremental changes to become current with the backed up state of `sample.db`.