Structure analysis and validation

Non-validated results can be meaningless. Structure analysis tools serve not only to inspect the final structure but to monitor it anytime during. Reasonable models are mostly correct and refinable, whereas as the unreasonable need to be improved first before they can enter further steps in a structure determination pipeline.

Validation page

Energy based analysis

Operates in two pairs of modes ("GRP_RESI", "GRP_ATOM") and ("BY_FORCE", "BY_ENERG"). When in the "RESIDUE" mode parameters are analysed and averaged over all atoms of each residue, whereas in the "ATOM" mode they are atom based, that is each atom has an independent record. The "BY_FORCE" mode is useful for all energy terms but electrostatics and density, where the size of the gradient ("FORCE" or "ENERGY" derivative) does not say enything about the deviation from the minima. For these two cases the "BY_ENERG" mode is more appropriate to validate the structure.

A single click on "ANA_ENER" will perform the analysis and display the color coded results. Color spans from blue to yellow that is form the minima to the maximal value of investigated term. The maxima value means the worst. at the end the procedure center on teh worst residue or atom, depending form the mode.

B-value analysis

Operates in two modes ("GRP_RESI", "GRP_ATOM"). B-values are either the average value for residues or atom based.

The results are displayed in the same color code as for ENERGY analyisis and the highest B-value RESIDUE or ATOM is moved into the center of the display.

Ramchandran plot

The Ramachandran plot can be toggled using "RAMACHAN", wheres "RAMA_PRO" diplays the positions of residues as crosses in the plot.

When PHI and PSI bonds are roated the residue cross moves over the plot when "HIS_RAMA" for that residue is activated.

Bond and angle statistics

"ANA_BOND" writes the statistic for each bonding parameter and the average sum of deviations. "ANA_BOND" does the same, however, for "ANGLES".

List generation

"GEN_LIST" is configurable via a GUI (click on the right mouse) and will grow into validation tool, which can replace the all other features of interactive validation.

It does the analysis, and generated a list of centers, which can be progressed from the "worst" to "best" on the list by pressing "g" (for go) or "CENT_NEX". See also MAIN_MENU:center.html.

Commands: Analyzing structures, force fields and maps

Structure analysis and validation means that a particular property values such as ("ENERGY", "TEMPERATURE" factor, "MAP", force field "PARAMETER") are scanned and "AVERAGE", standard deviation ("SIGMA"), "MINIMAL" and "MAXIMAL" values and number of repetitions for each record are stored and assigned to "ATOMS", "RESIDUES", force field "PARAMETER" records or even a MAP.

The outcome of "ANALYSIS" can be written to a file or terminal ("WRITE", "SHOW"), displayed as a "GRAPH" objects or color coded 3-dimensional "IMAGE" of a molecular structure.

The properties, which provide input for the analysis, are atomic "TEMPERATURE" factors, "CHARGES", "INTERNAL" coordinates, "ENERGY", "POINT" values and "BOND" and "ANGLE" force field "PARAMETERS" and "MAPS". Any ENERGY term or their combination can be taken into consideration.

Results can be written to a file using a SHOW ANALYSIS command, whereas WRITE ANALYSIS will store raw results, which can be reused for a later work or presentation.

I find most useful "BOND" and "ANGLE" residue based energy analysis to reveal the regions with the highest tension in the models and "VDW" atom based forces to uncover the bad contacts in the structure.

For interactive use see MAIN_MENU:analysis.html.

For increased functionality see MAIN_COM:anal.html.

An analysis demo

The demo shown, though dated to the middle of the 90s, uses the thrombin molecule to perform per residue based analysis of its temperature factor and energy:


> read file>doc/mol_images/protein/throm_ppac_fin.xpl atom xpl
> delete atom sele segm name PPAC end
> <>utils/protein.bond
> <>utils/get_top_par_19
> <DEF_ALL


> image init forward
> <>doc/analysis/load_anal
> return

Clicking the menu items will ERASE the image, QUIT the session, initialize the analysis arrays (ANAL_INI), perform temperature factor (ANA_TEMP) or energy (ANA_ENER) analysis, display in a form of a GRAPH the temperature factor analysis (GRAP_TEM) or shown the analysis results as a color coded 3-dimensional object (IMAG_ANA).

Before (re)starting the analysis the analysis arrays need to be initialized. The ANAL_INI


> analyze initialize residue

initializes the arrays on the residue basis, meaning that each residue gets an analysis record. The results can be written:


> write anal

The temperature factor analysis is performed by calling the file MAIN_DOC:analysis/anal_temp.com:


> analyze select .not. atom name H* end temperature

and the energy analysis by calling the file >doc/analysis/anal_ener.com:


> anal sele .not atom name H* end energy resid 1

It applies all currently turned on energy terms in an energy calculation that glides through the chain by including atoms of the neighboring (+ and - 1) residues as the gliding residue interaction.

The file <>doc/analysis/anal_graph_temp.com brings the temperature analysis to the screen. The procedure first defines the variables that are used for defining intervals for values and coordinate system origin for the graph. The frame base line is drawn at VALUE_BASE.


> set vari VALUE_BASE = 0.0
> set vari RESID_BASE = 0.0
> set vari VALUE_MIN = 0.0
> set vari VALUE_MAX = 80.0
> set vari RESID_BEG = 0.0
> set vari RESID_END = 300.0
> set vari FRAME_STEP = 10.0
> set vari RESID_STEP = 0.1

A dial set is defined to enable the manipulation of the GRAPH object.


> set dial 1 graph object 1 tran
> set dial 4 graph object 1 scal
> set dial 5 graph object 1 char

Within the GRAPH module first all necessary data are given to bring the analysis to the screen, including the frame and scale presentation.


> graph
>
> analyze scale x-axis 1. scale y-axis 0.2 \
> center x-axis  RESID_BASE cent y-axis VALUE_BASE \
> step x-axis RESID_STEP * 10.0 step y-axis FRAME_STEP \
> range x-axis 0. 30. range y-axis VALUE_MIN VALUE_MAX
> analyze color 80 frame
> analyze step x-axis RESID_STEP range x-axis RESID_BEG RESID_END
> analyze color 180 average label modul 25  select all end

Here one by one the AVERAGE, RMS and MINIMAL values are displayed as histograms and the MAXIMAL value as a LINE. The residues with quite high AVERAGE temperature factor values are LABELED with their sequence numbers.


> analyze show
> analyze color 110 hist average
> analyze color 140 hist rms
> analyze color 170 hist minimal
> analyze color 200 line maximal
> analyze color 130 average label range -200. 1.4 select segment number 1 end
> analyze color 220 average label range 40. 300. select segment name * end
> exit
> return

The file >doc/analysis/image_anal.com displays bonds of residues in color cods based on their AVERAGE values through the whole interval from blue (64) to yellow (160) with a step of 6 colors.


> set dial 1 init
> image cent calc
> set col sele segm name * end anal aver color 64 160 6
> ima sele segm name * .a .not atom name H* end set bond
> return

Ramachandran plot

The UTILS menu block on page 7 (MAIN_MENU:utils.html) contains menu items RAMACHAN and RAMA_PRO. Clicking RAMACHAN displays Ramachandran diagram (frame, contour levels etc..) and defines dials to manipulate its size and position on the screen, whereas the RAMA_PRO displays Ramachandran plot of the WORK_SEGM protein as crosses and sequence ID-s.

If you want to monitor a particular residue click it first and then click on HIS_RAMA. A white cross will appear within the Ramachandran diagram. If you will change its Phi and Psi angles the cross will follow the changes in its Ramachandarn diagram.