Skip to content

RPIBioinformatics/nncp

main
Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
Code

Latest commit

5f5d309

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
December 5, 2023 13:47
December 5, 2023 13:47
December 5, 2023 13:47
February 7, 2023 13:35
February 7, 2023 13:45
November 16, 2023 12:53

NNCP: Nearest Neighbor Contraint Propagation

The idea of NNCP is to assess how well can predicted chemical shifts can be used to assign an NMR spectrum.

There are two sources of experimental peaks.

  1. Constructing synthetic peaks from an assignment list
  2. Using raw peak lists from an experiment

In (1), we read in an assignment list and construct the peaks 'expected' to be in a spectrum given pulse sequence and atomic connectivities. (1) is the most ideal case because missing peaks or noise peaks are not considered.

In (2) we use peaks from a raw peak list. However, often these peak lists are not assigned. So we cross check with the assignment list to figure out the mappings from the assignment and the peak list.

Construct the 'expected' peaks. Perform nearest neighbor matching. Assess how 'far' the assignment list and peak list are from each other. Good matching, bad matching.

In the matching, we can use other spectra if present. Find consistent peaks between NHSQC, CHSQC, HNCO, HNCA tolerance matching?

xeasy format looks like it has the intensity information

Nearest-Neighbor Approach to NMR spectral assignment

nncp repo

nncp module

from nncp.assign import NNCP

assign

  • assigners.py
  • nearest.py
  • chainer.py

NNCP.assign

sample data scripts to make the data up separate from the assignment model

from nncp.pipeline

import nncp.datasets.datalib import gs_pipeline, ss_pipeline_

from nncp.assignment import NNCP

plectin_assign = NNCP(threshold=0.2, threshold_schedule_ = myramp, match_schedule_= matchramp, sequence)

plectin_assign_.gs_data_ = Dataset(peaklist, sequence) plectin_assign_.gs_data.reorder() plectin_assign.make-assignment(reporter=True)

plectin_assign_.score_assign_ plectin_assign_.export_assignments_() plectin_assign_.exhaustive_assignments_()

making the code modular what is nncp specific and what is sample data specific

reporter_metrics_ = ['all']

how to handle xeasy peak list reading and how to report errors first of all, its a stringent gs spin system construction assumes they are exactly equal how to report errors

maybe just include them but then mask them out completely later

Issues

  • handling of uncaptured / singleton NH roots in spins_systems
    • they are ignored at the moment
    • can do a better handling of those errors later

nncp should not be in charge of making the distance matrix -- do that somewhere else give the matrix to nncp user can specify the scale themself

helper functions diagnose the chemical shift predictions

write the results out to files

logs: dmat assignment initial ramp iter mask -- compact mask

plectin.endgame

msrb.endgame msrb.export msrb.report

endgame locate all glycines if onyl one glycine left unassigned -- assign it then locate smallest distance between a chain end and another chain end and proline all within .2 each step ignore the smallest and largest chain end at least one chain end sort left ends + pros sort right ends + pros zip them together and find distance between them

place in their domain of possible assignments the sequential position when you get to the end you need to go from both ends shared paths between the two directions

find the shortest gap with only one sequence of gs's assign that re-assign ignore the glycine thing

save load if run into an issue save it and re load it back endgame re calculate the scale find all connectivities from the ends of chains ignore the smallest end and the largest end to prevent circular assignments if you run into a proline stop if you run into a chained gs stop if you nothing left stop

About

Nearest-Neighbor Approach to NMR spectral assignment

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published