another update for the reviews

RPIBioinformatics · May 9, 2022 · fbb1f42 · fbb1f42
1 parent b5f321a
commit fbb1f42
Show file tree

Hide file tree

Showing 9 changed files with 756 additions and 7 deletions.
diff --git a/TUTORIAL.md b/TUTORIAL.md
@@ -4,7 +4,7 @@ The purpose of the following tutorial is to provide examples of how to use SpecD
 
 ## 1. SpecDB help menus and subcommands
 
-The first entry point to look for guidance on SpecDB functions is to use the help menus. If `specdb help` results in the help menu for SpecDB, then it is installed correctly. SpecDB has seven sub commands, each listed below and the command line arguments each take:  
+The first entry point to look for guidance on SpecDB functions is to use the help menus. If `specdb --help` results in the help menu for SpecDB, then it is installed correctly. SpecDB has seven sub commands, each listed below and the command line arguments each take:  
 
 1. `specdb create --db --backup`  
 2. `specdb forms --table --num`  
@@ -14,7 +14,7 @@ The first entry point to look for guidance on SpecDB functions is to use the hel
 6. `specdb backup --db --backup`  
 7. `specdb restore --backup --backup`  
 
-The subcommands listed above in the logical order the commands are used in. Each subcommand has a separate help menu from `specdb --help` that can be accessed, (e.g `specdb forms --help`). Users first need to create a SpecDB SQLite database file with `create`. Next, users need to populate the database with information, the `forms` command make the forms for the data fields needed for the SpecDB schema. With a filled form, users use `insert` to insert insert the form into their database. To verify/check what they inserted, users can use `summary` to investigate the contents of any SpecDB table. Users can pull data out of the database with `query`. With `query` users provide a SQL SELECT statement on the SpecDB summary view to pull data out of the database. Commands `backup` and `restore` are for the incremental backup operations. 
+The subcommands listed above in the logical order the commands are used in. Each subcommand has a separate help menu from `specdb --help` that can be accessed, (e.g `specdb forms --help`). Users first need to create a SpecDB SQLite database file with `create`. Next, users need to populate the database with information, the `forms` command makes the forms for the data fields needed for the SpecDB schema. With a filled form, users use `insert` to insert the form into their database. To verify/check what they inserted, users can use `summary` to investigate the contents of any SpecDB table. Users can pull data out of the database with `query`. With `query` users provide a SQL SELECT statement on the SpecDB summary view to pull data out of the database. Commands `backup` and `restore` are for the incremental backup operations. 
 
 ## 2. Instantiating a new SpecDB database
 
@@ -131,7 +131,7 @@ buffer_components: # describe the component(s) of a buffer, REQUIRED: `buffer_id
 
 It is important to note that if the `--num` option is provided, that the number of iterations to take match the number of tables requested. In the above case, the `buffer` form was created once because of the `1` after the `--num` and three `buffer_components` were made because of the `3` after the `1` in the `--num` options. The number of options in `--table` and `--num` are in a one-to-one correspondence with each other. If no `--num` options are provided it is assumed that all tables are produced just once.  
 
-Inspecting `sample/sample_forms/complete_sample.yaml` will find all the information required to describe a biomolecular NMR sample. It is recommended that users use `specdb forms` to create the forms they when they need because users can define multiple entities at a time, and one general form will not suffice. However, it is instructive to see all the metadata items that are tracked in SpecDB by looking at `complete_sample.yaml`. 
+Inspecting `sample/sample_forms/complete_sample.yaml` will find all the information required to describe a biomolecular NMR sample. It is recommended that users use `specdb forms` to create the forms when they need them because users can define multiple entities at a time, and one general form will not suffice. However, it is instructive to see all the metadata items that are tracked in SpecDB by looking at `complete_sample.yaml`. 
 
 To follow along with the sample forms provided in the repository, perform the following commands:
 

diff --git a/cli/specdb b/cli/specdb
@@ -88,7 +88,7 @@ sdb_query = sdb_subs.add_parser('query',
 		("query records from SpecDB summary table. If no --output is given"),
 		(" then results are simply print to screen"))))
 sdb_query.add_argument('--sql', nargs='+', type=str, metavar='<str>',
-	required=True, help=''.join((
+	required=False, default=False, help=''.join((
 		('query using sql syntax. The query can be on any table.'),
 		(' If no --output format is given, results are printed to screen.'))))
 sdb_query.add_argument('--star', action='store_true', required=False,
@@ -99,6 +99,12 @@ sdb_query.add_argument('--db', type=str, metavar='<path>', required=True,
 		(' use `specdb create` to create a new database file'))))
 sdb_query.add_argument('--out', type=str, metavar='<path>', required=True,
 	help='directory to place results of the query')
+sdb_query.add_argument('--indices', type=str, metavar='<str>', nargs='+',
+	required=False, default=False, help=''.join((
+		("provide a list of row ids in the summary table to collect\n"),
+		("users can provide a list of ids directly on the command line space "),
+		("separated, or in a .csv file with all ids comma separated "),
+		("on first line"))))
 
 # backup level parser
 sdb_backup = sdb_subs.add_parser('backup',
@@ -178,7 +184,12 @@ elif sdb.command == 'summary':
 	Summary.summary(db=sdb.db, table=sdb.table)
 
 elif sdb.command == 'query':
-	Query.query(db=sdb.db, sql=sdb.sql[0], star=sdb.star, output_dir=sdb.out)
+	Query.query(
+		db=sdb.db,
+		sql=sdb.sql[0],
+		indices=sdb.indices[0],
+		star=sdb.star,
+		output_dir=sdb.out)
 
 elif sdb.command == 'backup':
 	Backup.backup(db=sdb.db, object_dir=sdb.objects, backup_file=sdb.shafile)

diff --git a/specdb/Backup.py b/specdb/Backup.py
@@ -0,0 +1,105 @@
+#!/usr/bin/env python3
+
+"""
+Module for backing up and restoring a SpecDB database
+"""
+
+SQLITE_PAGE_SIZE_INDEX  = 16
+SQLITE_HEADER_LENGTH    = 16
+SQLITE_PAGE_COUNT_INDEX = 28
+
+import hashlib
+from hashlib import sha256
+import os
+import sqlite3
+import sys
+
+def backup(db=None, object_dir='objects', backup_file='backup.txt'):
+	"""
+	This function performs the incremental backup 
+	This function is taken with slight modifications from the following Github
+	repository: https://github.com/nokibsarkar/sqlite3-incremental-backup.git
+	
+	For this function, we implemented the python version of the backup at 
+	sqlite3-incremental-backup/python/sqlite3backup/python 
+	
+	Much credit to Github user nokibsarkar
+	
+	Parameters
+	----------
+	+ db				path to sqlite database to backup
+	+ object_dir		path to objects directory where all pages will reside
+	+ backup_file		file to save sha256 hashes of pages
+	
+	Returns
+	-------
+	True	if backup successful
+	"""
+
+	page_size = 0
+	# Open the database.
+	with open(db, "rb") as db_file_object:
+		assert(
+			db_file_object.read(SQLITE_HEADER_LENGTH) == b"SQLite format 3\x00")
+		db_file_object.seek(SQLITE_PAGE_SIZE_INDEX, os.SEEK_SET)
+		page_size = int.from_bytes(db_file_object.read(2), 'little') * 256
+		db_file_object.seek(SQLITE_PAGE_COUNT_INDEX, os.SEEK_SET)
+		page_count = int.from_bytes(db_file_object.read(4), 'big')
+
+	pages = []
+	with open(db, "rb") as db_file_object:
+		for page_number in range(page_count):
+			db_file_object.seek(page_number * page_size, os.SEEK_SET)
+			page = db_file_object.read(page_size)
+			hash = sha256(page).hexdigest()
+			directory, filename = hash[:2], hash[2:]
+			file_path = os.path.join(object_dir, directory, filename)
+			if not os.path.exists(file_path): # 
+				os.makedirs(os.path.dirname(file_path), exist_ok=True)
+				with open(file_path, "wb") as file_object:
+					file_object.write(page)
+			pages.append(hash)
+
+	# Write the pages to the object directory.
+	with open(backup_file, 'w') as fp:
+		fp.write('\n'.join(pages))
+
+	return True
+
+
+def restore(backup=None, backup_file=None, object_dir=None):
+	"""
+	This function performs the restore function from an incremental backup 
+	This function is taken with slight modifications from the following Github
+	repository: https://github.com/nokibsarkar/sqlite3-incremental-backup.git
+	
+	For this function, we implemented the python version of the backup at 
+	sqlite3-incremental-backup/python/sqlite3backup/python 
+	
+	Much credit to Github user nokibsarkar
+	
+	Parameters
+	----------
+	+ backup			path to sqlite database to restore to
+	+ object_dir		path to objects directory where all pages will reside
+	+ backup_file 		file to save sha256 hashes of pages
+	
+	Returns
+	-------
+	True	if restore successful
+	"""
+
+	# Read the pages from the backup file
+	with open(backup_file, 'r') as fp:
+		pages = fp.read().split('\n')
+
+	# Open the database.
+	with open(backup, "wb") as db_file_object:
+		# Iterate thourgh the pages and write them to the database.
+		for page in pages:
+			path = os.path.join(object_dir, page[:2], page[2:])
+			with open(path, "rb") as file_object:
+				db_file_object.write(file_object.read())
+
+	# Restoration is complete
+	return True
diff --git a/specdb/Forms.py b/specdb/Forms.py
@@ -73,7 +73,6 @@ def collect_schema_comments(tables=None, num=None):
 					form_dic[table_name].yaml_add_eol_comment(
 						comment, column, column=35)
 
-	print('form_dic num', num)
 	return form_dic 
 
 def forms(table=None, num=None, input_dict=None):
@@ -105,7 +104,7 @@ def forms(table=None, num=None, input_dict=None):
 
 	#print(table)
 	form_dic = collect_schema_comments(tables=table, num=num)
-	print(json.dumps(form_dic,indent=2))
+	#print(json.dumps(form_dic,indent=2))
 	yaml = ruamel.yaml.YAML()
 	yaml.preserve_quotes = True
 

diff --git a/specdb/Insert.py b/specdb/Insert.py
@@ -783,6 +783,38 @@ def insert(file=None, db=None, write=False):
 
 			conn.commit()
 
+	if 'default_processing_scripts' in record:
+
+		for index, scripts in record['default_processing_scripts'].items():
+
+			print(scripts)
+
+			try:
+				processing_script_path = os.path.abspath(
+					scripts['default_processing'])
+			except:
+				print('cannot get default processing script path')
+				print(f"given: {scripts['default_processing']}")
+				print('Aborting')
+				sys.exit()
+
+			with open(processing_script_path, 'rb') as fp: 
+				fbytes = fp.read()
+
+			scripts['default_processing'] = fbytes
+
+			status, value = insert_logic(
+				table='default_processing_scripts',
+				dic=scripts,
+				cursor=c,
+				write=write
+			)
+
+			scripts['default_processing'] = processing_script_path
+			print(status, value)
+
+
+
 	# now insert sessions if present
 	if 'session' in record: