|
|
|
|
When a user
starts the Dataset Creation interface, one will see the main
window as follows. ![]() There are four
window tabs or screens representing the four major steps in the
process to create a complete AutoAssign dataset. The tabs are
listed left to right. The rightmost button in the top of each
screen pulls you to the next screen or part of the process. At
the end of the process, the user will have AutoAssign formatted peak
lists and a control file describing the dataset. |
|
|
|
A user will see
the Peak List Conversion tab when the interface is started.
The
peak list conversion process is designed to reformat
given peak lists into AutoAssign format. Those reformatted files
will be used to
obtain registration and tolerance values in subsequent steps/tabs by
running the scripts in background.
Note: A user
should open only one Dataset Creation gui window since it does not
support multiple processes
running at the same time.
Working
directory [text field]: it is a base directory that the interface will use to
create a subdirectory with a time stamp. This subdirectory stores
intermediate files, logs, and the completed dataset generated during
the process. The default is
set to the current working directory. Auto Fill
[button]: when a
user decides to reuse the same peak lists or the same type of
configuration based on the previous analysis, this function allows a
user to import the data previously used. A
user only needs to specify the time-stamped subdirectory created by the
interface that preserves the previous configuration information. Note: The configuration file should NOT be edited by a user. This
option allows easy regeneration of a dataset from raw peak lists so
that the user can easily move back and forth between a spectral
visualization tool like Sparky and AutoAssign in order to improve the
quality of the peak lists. Add Experiment
[button]: at the startup, gui displays only 5
peak lists indexed from #1 to #5. When a
user needs to add more experiments or peak lists for analysis, this
function allows a user to display additional blank peaklist panel. Clear All
Entries [button]: this function clears populated
data fields displayed on screens. Convert Peak
Lists [button]: this function runs a set of
AutoAssign
perl scripts to converts peak lists entered by a user. It
generates intermediate files used to compute
registration/tolerance values for the subsequent steps. All
scripts executions run by interface is logged in event.log file found
under
the subdirectory named with a time stamp along with other intermediate
files. Exit [button]:
this button closes the interface. 1a. Peak List
Information
Type [drop
down
menu—required]: a user needs to choose one of experiment types such as
NH-HSQC, etc. Depending on the choice, it
will display additional information that a user needs to provide about
the experiment. If a user decides not to use the particular peaklist,
simply choose “N/A” from Type menu. When
“N/A” in Type field is selected, the gui will ignore any other
information displayed on that peaklist. Filename [text
field-required]: it specifies a filename and a file directory. Format
[drop down menu-required]: it specifies a type of peaklist format that the file was
produced with. Select Columns
[drop down menu-required]: it specifies the column number
found in the given peaklist for the corresponding label (H,N,intensity,
etc). For example, the first column in
your peaklist may be an intensity. If
so, you would like to choose one (1) for Intensity.
When a user clicks a “View File” button, it
displays the
content of peaklists in a table format. It
is only used to view its content for column selection and allow to
select columns/rows. Currently,
it accepts only one (1) Note
column. Note columns is the mechanism for pulling in
peak-based assignment constraints
from the raw peak lists. These assignment
contraints are normally entered into some "Notes" field inside the
spectral visualization tool used to generate the raw peak lists.
Select Rows [drop down menu-optional]: it specifies the top/bottom rows to delete. Phase
[drop down menu-optional]: it specifies the type
of phase that the experiment was conducted. Flip Intensity [radio button-optional]: it specifies the necessity of intensity sign conversion [from positive to negative or from negative to positive]. A user can choose this option depending on how the experiment data was collected. It only allows a global change to flip sign for values on the intensity column. |
|
|
|
2. Registration
Tab
A user will be
taken to the Registration tab
when “Convert Peak List” button is
clicked. It indicates that a user has completed the peaklist
conversion process. The main objectives of
this window are to: 1) allow a
user to calculate registration values, and 2) apply
those values to shift peaks by creating new / temporal peaklist files. Calculate
Registration [button]: this function starts
calculating registration value based on the information provided on
Peak List Conversion tab. It will open a progress monitor window to
notify you how many calculations are completed. The error may be
recorded in .reg file(s) under subdirectory if it occurs.
View
Registration [button]: the interface will create
an html
page by consolidating .reg files generated and will try to open it with
a browser. When an error occurs and .reg
files are improperly generated by the scripts, an empty page
might be produced. You may need to go back and check your peaklists or
might be able to obtain more information from event.log or .reg files
when an error occurs. The "Weighted Registration"
values are used to register peak lists. The "Full Std" values are
used in calculating tolerances included in the control file.
Revert to
Original Value [button]: this function refills
text fields for registration value with original registration value
calculated. Apply
Registration [button]: this function applies
registration values to each peaklist. It
creates new peaklist files with _shifted.pks extension.
Registration
Status [text field]: it starts with “N/A” when no
action is taken. It indicates how far the
calculation is done. When a user shifts peaks with a registration
value, it tells you that shifting has been completed. Global Shift
Values [text field]: it specifies values for
corresponding resonance to globally shift across all peaklist files
including a root file. For example, if you specify values for H
and N, The column values for H and N in all peaklist files including an
NH-HN root file will
be shifted. Registration
Results [text field-required]: it displays the calculation result indicating which
experiment is a root for particular resonance. A
user is allowed to modify the registration value to shift. |
|
|
|
A user is asked
to provide additional information to complete the control file creation. The value entered by a user on this tab will be used
in property,
tolerance and sequence section on a control file. Open Sequence
File [button]: this function opens a window for
a user to select a sequence file. Save
Properties/Create Control File [button]: this
function saves dataset property information into an internal data
structure and use it to produce a control file. This
button should be clicked to produce an initial control file and the
result will be displayed on the next tab. Sequence: Starting
Residue Number [text field-required]: it specifies the starting residue number of a given sequence. Sequence [text
field-required]: it specifies a amino acid sequence of a given protein. Protein Name
[text field-required]: it specifies the name of the protein. The
default value is given as XYZ. Deuteration
[drop down menu]: it specifies the type of
deuteration. Percent(%)
[text field]: it
specifies the percent of deuteration. The
valid value ranges between 0 and 100. Analysis Properties: These drop
down menus and radio buttons specify whether to run AutoAssign in a
given analysis
mode. A list of all property
keywords can be found in the Control File Format. Match
Tolerance Settings [text field-required]: it specifies the matching tolerance for each resonance. These tolerances are used in grouping
peaks into spin systems and for linking nearest neighbor spin systems
together into segments. |
|
|
|
A user can review
the control file generated by the interface, manually edit
its content, and then save it. To save a revised
control file select the "Rename Control File" button. A user
should check the content and the format of the control
file before the file is submitted to the AutoAssign server. At
the end, a
user can open the control file by selecting the "Open Control File"
button in this tab or by selecting the "Open Control File" File menu
option in the main AutoAssign window. View Quality
Report [button]: this function opens a
window
for a user to view quality assessment report on the dataset. It
contains a summary across the dataset and information on each
detectable spin system. This report is
immensely valuable in improving the quality of the dataset. A
user can use the "Error List" to guide them in improving the raw peak
lists. The report gives detailed information on each spin system
that can guide the user through refining all their peak lists
simultaneously on a spin system by spin system basis.
Read Control
File [button]: this function allows a user to
import an existing control file to view. Revert to
Original [button]: this function brings back
original control file to the screen. Rename Control
File [button]: this function allows to save a
control file with a different name. Open Control
File [button]: this function opens a control
file for analysis. The default control
file is the one with original values. When
a user saved the control file, that file becomes the default. Control File
[text area]: it displays the control file
generated based on the user input. A user should review its content to
make sure that all information is correct. See
AutoAssign help page for more details. |
<Index>
<dim1> <dim2> [dimX]
<Intensity>
<Label>[.PeakNotes]
...
*
Individual fields are separated by spaces
and/or tabs.
Comments are indicated by a "#" sign at the beginning of a line.
An asterisk is
used to indicate the end of a peak file. This asterisk is required.
Extended comments may be placed after the asterisk.
- The index field (first field) should list an integer that, along with the spectrum name, provides a unique handle for the peak. Thus in the example above, the first peak might be identified as "hnca 126".
- Peak resonance frequencies should be given in ppm, and values for frequencies and intensities may be in floating point or exponential format.
- The intensity of the peak should follow the frequency fields.
- The last field provides a label that is
included in various
output routines, and helps the user to identify the peak. A
period can be used to add a comment or note to this field. Such a
note
can be used to add assignment
constraints. The
form of the assignment constraints are:
#Index Xppm Yppm Zppm Intensity Label.notes
126 8.871 110.859 50.247 3242120 HNCA
125 8.870 110.898 62.529 724463 HNCA.;g45
73 8.744 116.161 56.614 2287600 HNCA
:
:
145 9.153 112.004 57.415 2788050 HNCA
*
There are nine types of assignment constraints: RACs, GACs, SACs, TACs, LACs, MACs, HMACs, and PACs which allow the user to override the resonance interpretation, resonance grouping, sidechain classification, spin system typing, spin system linking, and segment mapping steps performed by AutoAssign. This formulation of assignment constraints allows both gentle and hard use by the user. Gentle use allows the user to guide the assignment process without forcing errors into the assignments. Hard use is as it sounds. The user can force AutoAssign to absolutely trust certain information that the user provides, which can in turn force errors into the assignments if the information is not 100% correct. In general, it is safer to limit to a few possibilities rather than forcing the program to try the single most probable one. Assignment constraints are listed in a peak note of a peak in a given peak list. Having assignment constraints directly associated with peaks allows the user to use their spectral visualization tool of choice (i.e. Sparky) to enter them into the peak lists generated by that tool (i.e. the Notes field in Sparky).
Assignment constraints are given in a simple mnemonic that starts with a ";" followed by a single character identifying the type of assignment constraint. This is followed by assignment constraint specific information. The text is case insensitive. Any text after an underscore "_" character is ignored. The general form of the assignment constraints given in a peak note field is as follows:
peak_label.;<adghilmrs>[-]<list>...[_]ignored
where
;r[#]<rn>[,rn]... - RAC
to limit the possible resonance interpretations in
a
peak's dimension. The dimension specification is optional when
using 3-dimensional peaks.
;g<##>[,##]... - GAC to
include peak in a specific
GS. The group_id is an arbitrary positive number used by the
user. All
peak in the GS must be given the same group_id.
;a - SAC to specify that the
given GS is not a sidechain GS and should be in the usable list of GSs.
;d - SAC to specify that the
given GS is a sidechain GS and should not be in the usable list of GSs.
;i[-]<aa>[aa]... - i-TAC to
only
include (or exclude) the specified amino acids as possible ones for the
given
intraresidue ladder of the GS.
;s[-]<aa>[aa]... - s-TAC to
only
include (or exclude) the specified amino acids as
possible ones for the given sequential residue ladder of the GS.
;l<n/c>[-]<##>[,##]... - LAC to include
(or exclude) the possible neighbors in the c/n direction to those with
a given link_id. The link_id is an
arbitrary number given by the user. LAC's are
used in pairs of
;ln[-]## and ;lc[-]##.
;m[-]<ss>[,ss]... - MAC to
include
(or exclude) the specified sequence site as as a possible mapping for
the given GS.
;h<ss> - HMAC to immediately assign the
given GS to the given SS.
;p[-] - PAC to keep (or delete) this peak even if
it is a duplicate of another peak.
An example is:
126 8.871 110.859 50.247 3242120 HNCA.;g5;iad;a;m-a32,d50;rca-1;ln50_this_is_ignored_text
where
126 - peak index.
8.871 - peak amide hydrogen dimension.
110.859 - peak amide nitrogen dimension.
50.247 - peak aliphatic carbon dimension.
3242120 - peak intensity
HNCA - peak label.
Assignment Constraints:
;g5 - GAC indicates that this peak should be grouped into group 5._this_is_ignored_text - ignored text in the peak note.
;iad - i-TAC indicates that the intra ladder can only be amino acid types A or D.
;a - SAC indicates that this is not a sidechain GS.
;m-a32,d50 - MAC indicates that the GS containing this peak should not be mapped to A32 nor D50.
;rca-1 - RAC indicates that the carbon dimension (3rd dimension) is limited to sequential CA resonances. Explicity dimension declaration is ";r3ca-1".
;ln50 - n-LAC indicates that the n-linked neighbor GS must be identified by link_id 50 (i.e. the n-linked neighbor GS must have a SLAC of ";lc50" as well).
A resonance assignment constraint limits the possible resonance interpretations for the chemical shift or dimension of a peak. For example, in an HNCACB type experiment the aliphatic carbon dimension can represent chemical shifts for the intra and sequential CA and CB resonances of a GS. The general form of a RAC is as follows:
;r[#]<rn>[,rn]...
where
";r" indicates that this is a RAC.
"#" indicates the dimension (number) referred to for a given peak. This is not needed for 3-dimensional peak lists.
"rn" indicates a list of possible resonances. For an HNCACB peak, the possible resonances are: ca, cb, ca-1, and cb-1.
In the following example for an HNCACB peak:
;rca,ca-1
A RAC is given that limits the aliphatic carbon dimension to intra and sequential CA resonances.
A typing assignment constraint limits the possible amino acids allowed in the typing of the intra or sequential ladder of a GS (remember, a GS represents a dipeptide spin system). The general forms for TACs is as follows:
;<i/s>[-]<aa>[aa]...
where
";i" indicates an intra TAC or i-TAC that limits the possible amino acid types for the intra ladder of a GS.
";s" indicates a sequential TAC or s-TAC that limits the possible amino acid types for the sequential ladder of a GS.
"-" (minus sign) indicates that the TAC is excluding the following amino acids from the list of possible ones.
"aa" indicates an amino acid type.
In the following example:
;iilv;sg
A pair of TACs are given: i-TAC limiting intra ladder typing to I, L, and V; s-TAC limiting sequential ladder typing to G.
A linking assignment constraint limits which neighbor GSs that the given GS can be linked to. The general form of a LAC is as follows:
;l<n/c>[-]<##>[,##]...
where
";l" indicates that this is a LAC.
"n" indicates that this is a n-linked LAC or n-LAC which limits the possible n-linked GS neighbors.
"c" indicates that this is a c-linked LAC or c-LAC which limits the possible c-linked GS neighbors.
"-" (minus sign) indicates that the following link_ids should be excluded.
"##" indicates a link_id (positive number) identifying possible links that the GS can have (or not have) in the identified direction.
Normally, LACs come in pairs that associates a link_id from both the n-linked and c-linked direction. Given this form of an assignment constraint, it is usually easier to indicate an excluding LAC (one with a minus sign). In the following example:
;ln45,46
A n-LAC is given which limits the n-linking of the given GS to other GSs that have c-LAC link_ids of 45 or 46.
Mapping
Assignment
Constraint (MAC):
A mapping assignment constraint limits the possible sequence sites (SS) that a GS can map to. MACs are generally safer to use in excluding possible SSs. This is because a user can generally tell where a GS does not belong. Even if the user does limit a GS mapping to only one SS. AutoAssign may not map it there if other GSs may also be mapped to the same SS. The general form of a MAC is as follows:
;m[-]<ss>[,ss]...
where
";m" indicates that this is a MAC.
"-" (minus sign) indicates that the following SSs should be excluded as possible mapping sites for the given GS.
"ss" sequence site to limit mapping for.
In the following example:
;m-r30,r35
A MAC is given that limits the GS not to be mapped to R30 or R35.
A hammer mapping assignment constraint
immediately maps the
given GS to a specific SS. This is a very forceful assignment
constraint that should be used with extreme caution. This
assignment constraint has been added because sometimes nothing quite
works except a hammer. The general form of a HMAC is as follows:
;h<ss>
where
In the following example:
;he35
"Properties:"
is an optional section and can have different
keywords to alter basic AutoAssign behavior. The keywords are as
follows:
Examples of valid "Properties:" sections are:
Properties: deuterated
Properties: deuterated_50 override override1.aao
Properties: deuterated_75
Properties: deuterated_25
Properties: ILV_deuterated_90
Properties: no_sequential_intras
Properties: ignore_constraints
Properties: alt_root HNcoCA
Properties: root_matching_std_units 5.0
Properties: link_matching_std_units 3.5
Properties: max_matching_multiplier 2.0
Properties: min_matching_multiplier 0.5
Properties: tolerance_bound_grouping
Properties: keep_duplicate_peaks
Properties: min_spin_system_peaks 2
"Tolerances:"
specify default
intraresidue (in the root
dimensions for inclusion in a spin system) and sequential match
tolerances for the
atom types listed. THESE
TOLERANCES ARE NOW SUPERSEDED BY THE VALUES IN
THE "STD" SECTION. All atoms detected in the spectra
should be
listed
here,
even if that atom type does not participate in any "matching" per se.
In
this example, the sequential carbonyl frequency is detected in the HNCO
spectrum, but there is no corresponding experiment to detect
intraresidue CO frequencies. Even so, the CO atom is listed here, with
a default tolerance arbitarily set to 0.0.
"Spectra:"
specifies the start of the spectra section which specifies the peak
lists used in this dataset. This section is preceded by the
keyword "Spectra:" on a line by itself. Then each
peak list file is specified as follows. The section is ended with
a ":" on a line by itself.
line 1: <name> <ref-spec> <file> <intraresidue> <sequential> <through-space> [phase: {}] [NH2_PHASE] {
- The "name" field specifies a string to be used in referring to this spectrum
- The "ref-spec" field specifies the name of a second spectrum to use in referencing the root frequencies of this spectrum. If the spectrum does not require referencing and is itself a reference spectrum, this field should be specified as "ROOT". Finally, if the spectrum does not detect both root dimensions (HN and N15 in this example) the field should specify "nil".
- The "file" field should specify the name of the peak file, which must reside in the same directory as the table file.
- The "intraresidue", "sequential", and "through-space" fields should specify boolean values of 0 or 1 reflecting the type of interactions detected in this experiment.
- The optional "phase" field specifies how negative intensities should be interpreted for the non amide dimension. In this example, AutoAssign interprets all peaks with negative intensities in the intraresidue hcabb and cacb experiments as belonging to Gly spin systems. The expression "{CA {G}}" furthermore specifies that the dimension containing the named atom "CA" is what generated the negative intensity and that only these peaks should be considered as candidates for Glycine C-alpha chemical shifts. Alternatively, if the cacb spectrum had specified the phase as: {CB {ACDEFHIKLMNQRSTVWY}}, then only those peaks having negative intensities would be considered for C-beta shifts, and all C-alpha peaks would be constrained to have positive intensities.
- The optioanl NH2_PHASE keyword identifies this experiment as having NH2 phase. All negative peaks from the experiment tags the non-overlapped spins systems they are grouped into as sidechain spin systems.
- Line 1 is then terminated with a left brace to indicate the start of the individual dimension specifications.
Dimension Specifications:
For each dimension of a spectrum, the following information is provided on a single line:{ <atom> <sw 0> <sw 1> <correction> <tolerance> <folded/unfolded> [intra] [seq] [any] [phase: {}] [print_order val] [print_ref val] }
<atom>
may be a single atom name or a list of atoms delimited by curly braces.The atom names should correspond to those listed for "Tolerances:".<sw 0> and <sw 1>should specify the lower and upper bounds of the sweepwidth respectively, in ppm.<correction>is used to specify an external correction to the frequencies in that dimension. For example, if the reference molecule used for the C13 dimensions is non-standard, a correction value should be specified to adjust the chemical shift frequencies in those dimensions.<tolerance>a match tolerance is again specified for atoms in this dimension. This is now deprecated and not used; however, a value still must be placed here.<folded/unfolded>this field uses a string to indicate whether or not the values in this dimension are folded. If so, AutoAssign will use the specified sweepwidths for this spectrum to locate another (unfolded) spectrum which "contains" the sweepwidth of this spectrum and attempt to "unfold" the values.[intra] [seq] [any]
These are the "intraresidue", "sequential", and "through-space" keywords that override the values given for the spectrum as a whole.
[phase: {}]
The optional "phase" field specifies how negative intensities should be interpreted for this dimension.
[print_order val]
The value is an integer indicator the order of the dimension when printing assigned peak lists.