NNCP: Nearest Neighbor Contraint Propagation
The idea of NNCP is to assess how well can predicted chemical shifts can be used to assign an NMR spectrum.
There are two sources of experimental peaks.
- Constructing synthetic peaks from an assignment list
- Using raw peak lists from an experiment
In (1), we read in an assignment list and construct the peaks 'expected' to be in a spectrum given pulse sequence and atomic connectivities. (1) is the most ideal case because missing peaks or noise peaks are not considered.
In (2) we use peaks from a raw peak list. However, often these peak lists are not assigned. So we cross check with the assignment list to figure out the mappings from the assignment and the peak list.
Construct the 'expected' peaks. Perform nearest neighbor matching. Assess how 'far' the assignment list and peak list are from each other. Good matching, bad matching.
In the matching, we can use other spectra if present. Find consistent peaks between NHSQC, CHSQC, HNCO, HNCA tolerance matching?
xeasy format looks like it has the intensity information
Nearest-Neighbor Approach to NMR spectral assignment
nncp repo
nncp module
from nncp.assign import NNCP
assign
- assigners.py
- nearest.py
- chainer.py
NNCP.assign
sample data scripts to make the data up separate from the assignment model
from nncp.pipeline
import nncp.datasets.datalib import gs_pipeline, ss_pipeline_
from nncp.assignment import NNCP
plectin_assign = NNCP(threshold=0.2, threshold_schedule_ = myramp, match_schedule_= matchramp, sequence)
plectin_assign_.gs_data_ = Dataset(peaklist, sequence) plectin_assign_.gs_data.reorder() plectin_assign.make-assignment(reporter=True)
plectin_assign_.score_assign_ plectin_assign_.export_assignments_() plectin_assign_.exhaustive_assignments_()
making the code modular what is nncp specific and what is sample data specific
reporter_metrics_ = ['all']
how to handle xeasy peak list reading and how to report errors first of all, its a stringent gs spin system construction assumes they are exactly equal how to report errors
maybe just include them but then mask them out completely later
Issues
- handling of uncaptured / singleton NH roots in spins_systems
- they are ignored at the moment
- can do a better handling of those errors later
nncp should not be in charge of making the distance matrix -- do that somewhere else give the matrix to nncp user can specify the scale themself
helper functions diagnose the chemical shift predictions
write the results out to files
logs: dmat assignment initial ramp iter mask -- compact mask
plectin.endgame
msrb.endgame msrb.export msrb.report
endgame locate all glycines if onyl one glycine left unassigned -- assign it then locate smallest distance between a chain end and another chain end and proline all within .2 each step ignore the smallest and largest chain end at least one chain end sort left ends + pros sort right ends + pros zip them together and find distance between them
place in their domain of possible assignments the sequential position when you get to the end you need to go from both ends shared paths between the two directions
find the shortest gap with only one sequence of gs's assign that re-assign ignore the glycine thing
save load if run into an issue save it and re load it back endgame re calculate the scale find all connectivities from the ends of chains ignore the smallest end and the largest end to prevent circular assignments if you run into a proline stop if you run into a chained gs stop if you nothing left stop