Crystallographic refinement

Refinement is defined as optimization of all atomic model coordinates and B-factors against a kind of a map derived from experimental data. Besides the experimental target function also empirical force field parameters keep the model chemically reasonable.

After your refinement ".cmds" macros have been configured refine your model by clicking "REFINE" "REFINE_B" (see MAIN_MENU:refine.html).

Pressing "e" in the image window EXITS minimizations after the completion of the current step.

Target functions

MAIN calculates positional and B-value from a series of target functions.

Six target functions are used:

All target function are scaled against the chemistry terms so that the ratio between target functions and chemistry terms is fixed and updated every 12 cycles. The defalut value is 1.5, which set the ratio between target functions and chemistry to 2 : 3 (overweighting the chemistry term).

The maximum likelihood refinement targets require use of TEST reflection set and R-free is mandatory.

Density target

This would be called refinement toward a map or real space refinement. In MAIN terminology "REFINE" differs from the "MINIMIZE" item, "REFINE" affects all atoms, whereas MINIMIZE optimizes positions of a selected, local group defined by the key "active" (MAIN_MENU:nice_sel.html).

When your model crystallographic R-value is still high in high 20-ties or higher, a refinement of your complete model toward a 2Fobs-Fcalc, 3Fobs-2Fcalc map or a similar (MIR, phased combined or density modified) can significantly improve your R-value.

Optimization of NCS operators for density averaging before applying cyclic averaging (which uses Fourier transformed maps) is a mandatory step.

Refinement against a difference (Fo-Fc)

It is one of the oldest forms of refinement. Nowadays it makes sense to apply it for multi crystal averaging and for high resolution structures as a short cut, since it is derivatives are based on a few points of electron density only.

R-factor calculation

Crystallographic R-value coming out of MAIN is a result of a fitting procedure, where Fcalc are scaled to Fobs independently for each resolution shell. The default number of shells is 10. The scales may be also averaged. Averaging is turned on automatically when bulk solvent correction term is turned on.

The overall reciprocal space B-factor is calculated independently for each R-value calculation during refinement procedures. You can adjust it by an independent calculation.

Bulk solvent correction

The bulk solvent correction procedure is based on the flat model of bulk solvent method (Fokine and Urzhumtsev (2002) Acta Cryst. D58, 1387-1392) using optimized parameters. As such it is also implemented in RefMac.

Turning it on by configuring the "bulk_solv_corr.cmds" using "MAIN_CONF:create_bulk_solv_corr.pl" and then clicking the "BULK_SOL" item in the MAIN_MENU:map_calc.html calculates partial structure factor of the model envelope, which are included in refinement.

Currently only the envelope is supported as generated by the atomic model "MAK_MASK" in MAIN_MENU:dens_mod.html.

After turning on or off the nulk solvent correction , the "refine.cmds" and "refine_b.cmds" files have to be updated.

Positional refinement

Use "main_config" tools (MAIN_CONF:menu_refine.sh, MAIN_CONF:create_refine.pl and MAIN_CONF:menu_refine_b.sh, MAIN_CONF:create_refine_b.pl) to create and adjust your local "refine.cmds" macro and "refine_b.cmds" as it defines groups and forces for non-crystallographic similarity constraints as well.

After the "refine.cmds" has been configured click "REFINE" and run it.

Configure "refine.cmds"


> create_refine.pl
 -h|--help)    prints this message with available options and current status
 -s|--step)    sets number of minimization steps (per cycle) [30]
 -k|--kick)    sets starting and final maximal xyz kick [0.0-0.33]
               current = [0 - 0]
 -c|--cool)    sets number of cooling cycles  [0]
 -b|--bramp)   sets minimal and maximal starting atomic B-value ramp [5 60]


 -t|--target)  target function for refinement [MAP/DIFF/LSQ/ML/ML_EST/ML_APR]
               current = [ML]
 -a|--atoms)   missing atoms for estimation of ML [100]
 -r|--ratio)   chemistry / X-ray or density target force ratio [0.8 - 2.0]
               current = [1.5]
 -m|--map)     MAP for DENSITY refinement [1]
 -x|--xmap)    MAP for X-DIFF refinement [3]
 -q|--qmap)    MAP for rec. space refinement [3]


 -n|--ncs)     Non-Cryst-Similarity NCS energy flag [ON/OFF]
               current = [OFF]
 -g|--gncs)    Non-Cryst-Similarity NCS geometry force = [5]


 -f|--free)    r-free flag [ON/OFF] current [ON]
 -o|--output)  set file name of the created macro [refine.cmds]
    --doit)    create the macro

The "MAP/DIFF/LSQ/ML/ML_EST/ML_APR" are target functions "-t|--target" that can be used in refinement. The "ML_EST" requires also an estimate of missing number of atoms "-a|--atoms". Pavel Afinine said that 25% error in estimate is still good enough.

Maps used in refinement are defined by "-m|--map", "-x|--xmap", "-q|--qmap", whereas the strength of the target terms is controlled via the ratio between traget and chemistry terms. Geometry violations printed out during each step of refinemeny will tell you if your RATIO is too low or too high. (See also ANA_BOND and ANA_ANGL in MAIN_MENU:analysis.html).

Before refinement it makes sense to "KICK_ALL" the atomic coordinates to jump out of the current state (heat the atoms to several 1000 K) and then let the minimizer to cool down the molecule in a "fast cooling" procedure. Multiple kicks "-k|--kick" can be applied in a number of cooling cycles "-c|--cool", each cycle with the maximal along the line of the two specified kick values.

Refinement output writes the "step" number, "stepsize", total energy "summ", average "gradient" and values for each particular energy term turned on. "Density" is the map constraint term, whereas "r-value" is only listed, its value changes only when structure factors are recalculated.


 step:     15      stepsize: 0.63802867E-13
 summ:    -6507.4916 grad:       24.2193 bond:      169.3255 angl:      663.3933
 dihe:     1206.1670 impr:      135.8787 vdw :     -236.2183 elec:    -4877.6440
 dens:    -3568.3938 rval:       37.0655

B-value refinement

Use "main_config" tools (MAIN_CONF:menu_refine_b.sh, MAIN_CONF:create_refine_b.pl) to create and adjust your local "refine_b.cmds" macro.

After the "refine_b.cmds" has been configured click "REFINE_B" and run it.

Configure "refine_b.cmds"


  -h|--help)    prints this message with available options and current status
-v|--over)    overall B-value refinement  steps [-1] OFF [>0] on = [0]
-s|--step)    ind. B-value refinement  steps [-1] OFF [>0] on = [60]


  -k|--kick)    sets starting and final maximal B-kick [0-30] current = [2.0  2.0]
  -c|--cool)    sets number of cooling cycles = [1]


  -d|--define)  define group organization [SINGLE/ATOM/RESIDUE/SEGMENT] = [SINGLE]
-l|--angle)   B-value angle inter-group constrain force [-1.0] off [> 0.0] on = [0.008]
-p|--bond)    B-value bond inter-group constrain force [-1.0] off [> 0.0] on = [0.01]
-g|--group)   B-value intra-group constrain force [-1.0] hard [> 0.0] soft = [-1.0]
  -b|--bramp)   bramp interval of valid B-values = [5] [80]


  -t|--target)  target function for refinement [LSQ/ML/ML_EST/ML_APR]
                current [LSQ]
  -a|--atoms)   missing atoms for estimation of ML [100]
  -r|--rec)     reciprocal space refinement force [800]
  -q|--qmap)    MAP for rec. space refinement [3]


  -n|--ncs)     Non-Cryst-Similarity NCS B-value flag [ON/OFF] = [OFF]
  -f|--fncs)    B-value NCS force  = [0.004]


  -o|--output)  set file name of the created macro [refine_b.cmds]
     --doit)    create the macro

For B-value refinements only four "LSQ/ML/ML_EST/ML_APR" out of six target functions "-t|--target" can be used. "MAP" and "DIFF" make no sense. The "ML_EST" requires also an estimate of missing number of atoms "-a|--atoms". Pavel Afinine said that 25% error in estimate is still good enough.

Before refinement atomic B-factors can be kicked. Similarly as in positional refinement, consecutive kicks can be applied in cycles.

Overall B-value refinement

Overall B-value refinement is based on Wilson plot statistics comparing Fobs and Fcalc in separate shells.


> mini b-val overall sele segm name SEGMENTS end step 3

Individual B-value refinement

For B-value optimizations GROUPS have to be DEFINED. The group definitions are located at the top of a MAIN_CMDS:refine_b.cmds macro.

The smallest group is an atom and the largest are all atoms. Between these two extremes any number of groups composed from any combination of atoms is valid. Currently GROUPS can be defined through SINGLE, ATOM, RESIDUE and SEGMENT command words. The GROUP SINGLE command defines single atoms group - each selected atom is a separate group. The GROUP ATOM defines all selected atoms as a group. The GROUP RESIDUE command DEFINES each selected residue as a group and the GROUP SEGMENT does the same on the segment level.


> define init group
> define group atom select all end
> define group single select all end

The MINIMAL and MAXIMAL atomic B-value specify in which limits can B-factors of individual atoms be optimized. Once defined, these are hard limits.

The X-TARGET SCALE defines the force constant by which the individual atoms are pulled in the direction suggested by the measured data.


> ener x-targ ( on scale 330. )

Oppose to the measured data, which try to scatter atomic B-values, several restraining terms try to make neighboring atomic B-values similar to each other. The ANGLE, BOND, GROUP force constants pull together B-values of groups that are linked by covalent BONDS or ANGLES. The smallest group (by default) is an individual atom. A GROUP restraint is hard when the GROUP force is set to a negative value. Otherwise atoms are pulled towards the arithmetic middle of each group. Negative BOND and ANGLE force constants turns these restrains off.


> ener minimal = 5.0 \
> maximal = 80. \
> b-val bond = 0.004 \
> b-val angle = 0.003 \
> b-val group = -1.0

Besides, non-crystallographic-similarity, NCS, constraints can be applied. These constraints refer to the same groups which are used in positional refinement. See MAIN_DOC:nmol/refine.cmds. The NCS B-FORCE constant pulls the B-values of similar atoms towards their arithmetic middle.


> define constr ncs b-force 0.004

Before start of minimization B-values of selected atoms can be kicked in order to shift refinement from the current (local minimum) position.


> set temp kick = 4. sele segm name SEGMENTS .and. weight 0.1 1.1 end

B-value for refinement with an ENERGY command and then start MINIMIZATION. Command below submits 10 cycles of B-VALUE refinement. Energies will be WRITTEN each iteration step.


> mini b-val indiv  sele segm name SEGMENTS end step 10 writ 1

An equilibrated assignment of B-VALUE scales is essential for a succesfull refinement. Values as BOND 1.0 for example result in an equal B-value for all atoms. After refining a structure check the B-VALUES sigmas with a SHOW TEMPERATURE command, and if necessary increase or release the force of constraining terms.

See also to the MAIN_DOC:1mol/1mol.html example.

Fast cooling versus slow cooling refinement

Random displacement of atomic positions and B-values from their current value is called "kicking". So instead of heating the system to 10 000K or more doing some simulation at these temperatures and then gradually lowering the temperature bath, the system is distorted in one step using a random number generator and then optimized. As refinement approaches its end, kick amplitudes are gradually decreased. Resulting initial geometry distortions can be so bigger than usually reached by high temperature dynamics. The distortions are removed in approximately 10 steps of minimization.

Kicking protocols are available through MAIN_CONF:create_refine.pl.

Kicking allows allows B-value manipulation, whereas molecular dynamics allows only atomic coordinate treatment.


 set coor kick 0.2 sele ... end
 set temp kick 5. sele ... end

Applying NCS constraints

Non-Crystallographic-Symilarity Constraints can be applied between equivalent molecular parts of in principle unlimited number of equivalent regions within a single crystal as well as among several crystal forms.

Refinement using NCS in a single crystal form

You are advised to let the "main_config" build the initial "refine.cmds" file and specify the appropriate NCS constraints.

When approaching the end of structure determination you are advised to edit the file "define_key_out.com" in order to remove the non-super imposable residues from the NCS constrains forces.


 key out sele .not all end
< define_key_out.com

The "define_key_out.com" macro would look something like this:


 key out sele ( seq  x1 x4 y16 : y27 ) \
         end

The lines below show how it is possible to impose stronger NCS constraints on main chain atoms (FORCE 20.) and slightly weaker on the side chain atoms (FORCE 5.) An example MAIN_DOC:nmol/refine.cmds file is in the MAIN_DOC:nmol/ directory.


 define init constrain ncs
 define constraint ncs \
    sele ( segm name MOLA .a .not out ) .a atom name CA N C O H end
 define constraint ncs
    sele ( segm name MOLB .a .not out ) .a atom name CA N C O H end
 define constraint ncs force 20.
 define constraint ncs b-force 0.04


 define constraint ncs next
 define constraint ncs \
             sele segm name MOLC .a .not atom name CA N C O H end
 define constraint ncs \
             sele segm name MOLD .a .not atom name CA N C O H end
 define constraint ncs force 5.
 define constraint ncs b-force 0.01

It does make sense to verify the success of the NCS groups creation. Since equivalent atoms are the consecutive ones in the both selections the NCS keys should contain at least the same number of atoms:


 show key


SHOW> KEY:   9 int_ncs__1 SELECTED  254 OF TOTAL 6176
SHOW> KEY:  10 int_ncs__2 SELECTED  254 OF TOTAL 6176

The next more sever test (good for debugging) is to create PAIR lists and see, whether the equivalent atoms match by their atomic, residue and sequence names or not.


 make pair init select int_ncs__1 end select int_ncs__2 end
 write pair

If you want to be sure the default initial guess will work well calculate the superposition parameters before the minimizer starts to work:


 define constraint ncs auto_guess


RMS_FIT> PARAMETERS X-ROT: 191.373 Y-ROT": 111.292 X-ROT"":  35.276
XYZ-TRAN:  88.258   6.726  15.660 RMS-FIT   .452"XYZ-TRAN:  88.258   6.726  15.660 RMS-FIT   .4520

The "RMS-FIT" value should look reasonable. Values above 3. are very likely to mean that the selections are not properly defined and maybe that the procedure did not converge properly.

Instead of leaving all 6 degrees of freedom (3 rotation axis and translation component) one can also fix some of the parameters (with flag exual 0), by providing initial guess and fixing the rotational axis (in thes case the 2-fold). Use of the matrix defined by polar angles is certainly the choice here:


 rms coor all eigen \
  sele segm name MOLB end sele segm name MOLA end


 show rms


 defin constr ncs polar \
       set_gues 2 RESULT_0 RESULT_1 180.0 RESULT_3 RESULT_4 RESULT_5 \
       flags 1 1 0 1 1 1

The energy term is turn on with the NCS ON command


 energy ncs on

Refinement using NCS between two crystal forms

When refining a molecular model in two or more crystal forms means that each model should interact with its own diffraction data and own unit cell. Converted to MAIN language it means, that each molecule should interact with its own crystal form difference Fourier map during a refinement procedure and that all energy interactions between the molecular models, besides the NCS term, should be excluded.

Here only the differences will be described in detail, complete example files listed below are only briefly presented. You can find the here mentioned and described command files in the directory MAIN_DOC:2crys/. The "MAIN_DOC:2crys/refine.cmds is actually there only to switch between individual "MAIN_DOC:2crys/refine_work.com" and simultaneous "MAIN_DOC:2crys/refine_both.com" refinment procedure. Only the later macro is described here.

The NCS term DEFINITIONS between a single or several crystal forms look the same as for the single crystal form cases. It is however necessary to distinguish the models from each crystal form, because

In the example selections KEYs for each crystal form are defined. Molecule MOLH is in hexagonal and molecule MOLO in an orthorhombic cell.


> key MOL_HEX sele segment name MOLH end
> key MOL_ORT sele segment name MOLO end

With the DEFINE CONSTRAINT INTERACTIONS we prevent atoms from cross interactions between the models in the two different crystal forms. These however requires un update of the bonding energy term lists (BOND, ANGLE, DIHEDRAL and IMPROPER terms).


> define init constr inter
> define constr inter sele MOL_HEX end sele  MOL_HEX end
> define constr inter sele MOL_ORT end sele  MOL_ORT end
> <DEF_ALL WORK_SEGM

Hereby we limit the interactions of each model with its difference density. The maps should, however, be already created before.


> set vari MAP_HEX = 1
> set vari MAP_ORT = 2
> define init constr map
> define constr map MAP_HEX sele MOL_HEX end
> define constr map MAP_ORT sele MOL_ORT end

Now we define scaling factors for each map and can start the minimizer. Since MAIN can not deal with two reflection sets simultaneously, we need to call a MACRO which will during a MINIMIZE command calculate for each crystal form its difference Fourier map.


> energy dens map MAP_HEX dens scale 180.
> energy dens map MAP_ORT dens scale 220.
>
> minimize sele all end step 100 writ 5 phase 15 gain 90000 \
> macro phase_ncs.com

The "MAIN_DOC:2crys/read.com", "MAIN_DOC:2crys/get_data_hex.com", "MAIN_DOC:2crys/get_data_ortho.com" and "MAIN_DOC:2crys/phase_ncs.com" files do the whole work.

Here only the "MAIN_DOC:2crys/phase_ncs.com" performing the two difference Fourier maps calculation is described, first the hexagonal and then the orthorhombic one. Reflection file, cell constants and symmetry operators are read for each calculation separately, since MAIN is currently only able to deal with a single space group at a time.


> <get_data_hex
> read file FILE_CELL cell
> read file FILE_SYMM symm
> read file FILE_REFL reflect init resol RESOL_HEX 100. limit 0 0 0 friedel
> set vari MAP_CELL = MAP_HEX
> make map MAP_CELL zero
> make map MAP_CELL set -1 1 -9999.
>
> make map MAP_CELL atom dens \
> atom symm select MOL_HEX end
> make map MAP_CELL rescale
> <calc_a_fofc_map MAP_CELL
>
> <get_data_ortho
> read file FILE_CELL cell
> read file FILE_SYMM symm
> read file FILE_REFL reflect init resol RESOL_ORT 100. limit 0 0 0 friedel
> set vari MAP_CELL = MAP_ORT
> make map MAP_CELL zero
> make map MAP_CELL set -1 1 -9999.
>
> make map MAP_CELL atom dens \
> atom symm select MOL_ORT end
> make map MAP_CELL rescale
> <calc_a_fofc_map MAP_CELL
> return

The "MAIN_DOC:2crys/calc_a_fofc_map.com" generates a difference 'Fobs-Fcalc' map.


> subroutine int MAP_LOCAL
> make map MAP_LOCAL set -10000 -9900 0.0
> fourier map MAP_LOCAL
> reflect shell 12  r-values
> refl set ampl phase fwork = fcalc * 1
> refl set ampl scale fcalc   fwork = fobs * 1 - fcalc * 1
> make map MAP_LOCAL zero
> make map MAP_LOCAL convert complex
> reflect fill-map MAP_LOCAL sele defined end
> fourier map MAP_LOCAL back
> !make map MAP_LOCAL rescale
> return

Partial model refinement

A refinement procedure works well when (positional) derivatives are as correct as possible. They depend upon correctness of structure factor amplitudes and phases. Wrong phases result in meaningless positional gradients, which are in MAIN calculated from difference (Fobs-Fcalc) density maps. When model is incomplete the model structure factors are quite far away from their final form regarding both, amplitudes and phases.

The idea here is that we try to fulfill the partial model density with density information obtained from other sources. These can be an MIR map, a solvent flattened map, an averaged map, a 2Fobs-Fcalc map,


... actually any kind of map which contains additional density
information about the structure and is not included in the current partial structure model available. From this map the structure factors are calculated that guide the current atom models with help of a minimizer to their final place. This procedure is essentially a density modification method, but since it happens through MINIMIZE command it was placed under refinement. After refinement is completed you should look into your map MAP_FO, since it may contain substantial differences (improvements) when compared with other kind of maps (obtained by phase combination or density modification procedures).

The following procedure tries to reduce structure factor errors by combining partial model density map with MIR density map into a map used for structure factor calculation. From these Fcalc-s a difference (Fobs-Fcalc) map is calculated and later on used for model positional derivatives calculation within MINIMIZATION procedure.

The following procedure (dens_comb.cmds) requires 4 unit cell maps to be held in the memory (MAP_ORIG -the MIR map), a work space map (MAP_FOFC) and the (MAP_FO). It sets the variables for the whole run (dens_comb.com).

First define the energy term lists and set the crystallographic weights and some (maybe) undefined b-values.


> <DEF_ALL
> define impr by topo init
> set charge sele resi name ARG LYS ASP GLU .a .not -
> atom name C N CA O end   0.0
> set weigh sele atom name H* end 0.0
> set temp sele all end = 28.0

Copy the density of the MAP_ORIG map (MIR) into the MAP_FO and specify the number of shells used for the r-value calculation (scaling of Fcalc to Fobs).


> make map MAP_FO zero
> make map MAP_FO set -1 1 9999.
> make map MAP_FO from MAP_ORIG copy
> refl shel 15

Set the variables to control the density combination run: density energy scale factor (DENS_SCAL), number of minimization steps (MIN_STEP), number of minimization cycles after which a maps will be calculated, radius of the molecular model envelope (RAD_MASK), radius of the model map which will be considered to replace the current density, additional weight for the Fcalc map (WEIGHT_FC). The WEIGT_FC is an empirical factor. It should scale the model Fcalc map so that it does not disappear in the MIR map noise as well that this map would dominate over the MIR map. My criterion is to calculate the combined map so that the contours of the model fcalc map nicely fulfill the MIR density. This contour level is later on used in the procedure. The useful interval might be between 60 to 140% of the contour level. My advise is: try and see.


> set vari DENS_SCAL = 800.
> set vari MIN_STEP = 120
> set vari RE_PHASE = 15
> set vari RAD_SOLV = 4.5
> set vari RAD_COMB = 2.8
> set vari WEIGHT_FC = 1.1

The key "model_part" includes atoms model that will be refined and included into density combination procedure, while the key "model_mask" includes actually all the atoms places at the location where the model might be. Thereby we expand the model region and include a solvent flattening cycle too.


> key model_part select segm name DM_* end
> key model_mask select segm name D* end


> ener dens map MAP_FO dens scal DENS_SCAL
> set vari DONE = 0
> minimize sele work_set end step MIN_STEP writ 1 \
> phase steps RE_PHASE gain 90000 \
> phase macro dens_comb.cmds


> return

The MAIN_CMDS:dens_comb.cmds macro combines the MIR and partial model maps and calculated the difference map for the MINIMIZER. For description see the chapters "Density modification" section "Density combination".