Use of non-crystallographic similarity (NCS)
Multiplicity of equivalent molecules in an asymmetric unit of a crystal is a blessing for a crystal structure determination. In such cases electron density defining molecular structures can be averaged, atomic positions and B-values of equivalent atoms can be subjected to additional constraints (NCS Non-Crystallographic-Similarity constraints). Density averaging results in reduction of model bias and allows to interpret electron density maps with rather poor initial phase information as well as to solve structrues with low completness.
Before continuing this chapter you are advised to be acquainted with the chapter "1 molecule case" (MAIN_DOC:1mol/1mol.html) that describes the other MAIN tools underlaying also cases with multiple equivalent subunits.
There are a few assumptions in this chapter and herein mentioned macros that you should bare in mind:
equivalent molecules.
The cathepsin L - p41 fragment (Guncar et al., 1999, EMBO J. 8, 793-803) allows to explain the MAIN philosophy and most of tools dealing with non-crystallographic similarity. The complete case with all data and macros is available in "../cases/catl_p41". Macros shown are all configured and created using "main_config" utilities.
> create_main_config.pl -m MOLA MOLB IA IB -g "MOLA MOLB" "IA IB" --doit creating ".main" creating read.com creating save_file.cmds creating re_image.cmds creating symmetry.cmds creating symmetry_ca.cmds creating refine.cmds creating refine_b.cmds creating gen_solvent.com creating make_masks.cmds creating dm_prep.cmds creating dm_next.cmds creating dm_loop.com 1 strategy EACH group MOLA MOLB 2 strategy EACH group IA IB creating load_4mol.com creating rms_fit.cmds creating create_all_others.cmds
MAIN configuration files have now been generated.
Now type "mainps" to start your MAIN session or if you are not happy with defaults see the chapter below adjust your input data using "menu_read.sh" and other scripts.
The macros are now created and "MAIN" interactive session can be started by invoking MAIN (use mainps)
> mainps
9.2 Non-default setup
This case has two enzyme molecules with segment names "MOLA" and "MOLB" and two inhibitors attached to them with segment names "IA" and "IB".
Changing the averaging strategy
Currently there are two strategies available in MAIN " By default MAIN switches between"By default MAIN switches between:
Typing "create_main_config.pl" without any parameters writes the current main configuration setup (similarly as any other configuration "create_*.pl" Perl "menu_dens_mod.sh" shell scripts)
create_main_config.pl
-g|--group ) defines NCS groups: specify the list of segments belonging to groups for each group embrace the list in "" 1 EACH | MOLA MOLB 2 EACH | IA IB -s|--strategy) defines averaging scripts strategy - specify parameters for each group separately [1 EACH] EACH: average each molecule separately (default) ONE: average one and distribute the averaged density to others LINK: use operators from another group [ 2 LINK 1 ] WHOLE: the whole group builds a single mask for proper (spherical) symmetry
Strategy "EACH" means that density for each member of the group will be calculated as average of all others, choosing "ONE" means that the density will be averaged only for the first member of each group and then distributed to all equivalent ones by map rotation and translation. "LINK" will use geometrical operators from the linked group and whole will do averaging within a single mask defined by all members of the group. By deafult the strategy "EACH" is applied untill the number of griup members is smaller than 5, wheras "ONE" is chosen for groups with a higher number of group members.
For example, to choose "ONE" for the first group and "EACH" for the second one should after the stratgegy "-s" specifier type the group number followed by the strategy.
create_main_config.pl -s 1 ONE 2 EACH
When an inhibitor or cofactor is attached to an enzyme or some other part of well known structure there is no need to change the well defined part each time you actually want to update the ligand part only.
the "LINK" strategy will cover thuis in the future, however, now one still need do do some typin explicitly.
Ligand is presumably bound to the enzyme with the same orientation in all equivalent complexes, so it makes sense only to use the superposition parameters from the enzymatic parts and not calculate them from (sometimes even partial) ligand models.
So one can edit the created rms_fit.cmds file and simply copy mol_MOLA_to_MOLB.com into the mol_IA_to_IB.com and correspondingly create the file mol_IB_to_IA.com. (One can also change the rms_fit.cmds macro, so that it writes the inhibitor superposition files mol_IB_to_IA.com and mol_IA_to_IB.com based on superpositions of "MOLA" and "MOLB".
Summary of differences between 1mol and nmol case
There creation procedure is essentialy the same as in the elementary real case (MAIN_DOC:1mol/1mol.html) chapter, however, several macros differ and some additoinal ones are created. The differences reflect the relations between the topologically identical segments "MOLA" and "MOLB".
There is only one difference between the 1 molecule startup file (MAIN_DOC:1mol/read.com) and this one (read.com), namely in order to enable manipulation of several topologicallly equivalent molecules an additional menu block (load_4mol.com) is loaded.
Different segments have different colors
The re_image.cmds macro displays all molecules. Crystallographically weighted atoms are colored close to yellow, whereas the ones with occupancy 0.0 are colored green.
The symmetry mates (symmetry.cmds and symmetry_ca.cmds) are also differently colored. Each different segment gets a new color close to red regardless of the crystallographic weight of the atoms:
Refining with NCS constraints helps. It can also improve your parameters (geometrical superposition of maps) for electron density averaging. Here NCS constraints are applied to atomic positions.
It is important especially towards the end of refinement, that you recognize, which parts of various molecules are different and so exclude them from the NCS constraints. Their sequence IDS are to be included in the selection key "out". The refine.cmds macro must be therefore modified using a text editor. The NCS groups are defined only once, B-factor refinement (refine_b.cmds) simply uses the keys defined for positional refinement. The most apropriate is the use of SEQUENCE keyword. In order to use an equivalent selection of residues throughout the session in macros rms_fit.cmds, refine.cmds and refine_b.cmds it is advisable to write the "define_key_out.com macro.
key out sele .not all end
! include all the non equivalent residues here ! key out sele ( seq x1 x2 .or seq X7 .or seq 167 : 190 ) \ ! end
<define_key_out.com
! defining groups
define init constr ncs define constr ncs sele atom name CA N C O H CB %G* %D* \ .a ( segm name MOLA .a .not out ) end define constr ncs sele atom name CA N C O H CB %G* %D* \ .a ( segm name MOLB .a .not out ) end define constr ncs force 5. define constr ncs b-force 0.004
ener ncs on
Having several identical subunits allows you to utilize the benefits electron density averaging. Besides averaging procedure based on superimposed regions of molecular masks, do not forget that they may not be complete. Therefore it makes sense to keep regions of density in the cyclic procedure, although you are not sure to which molecule they belong. See in MAIN_DOC:1mol/1mol.html and MAIN_MENU:map_atom.html for instructions. Besides, you may skeltonize these regions or build dummy models for mask creations and superposition (MAIN_DOC:mol_repl/mol_repl.html).
The same menu items and files with the same names that run density modification procedures for 1 molecule (MAIN_DOC:1mol/1mol.html) are used also when electron density averaging is involved. Procedures are invoked via menu block "DENS_MOD" items (MAIN_MENU:dens_mod.html). The only item that is here new is the "RMIS_FIT" calling the rms_fit.cmds macro calculates RMS fit based superposition matrices for all possible combinations of segments within each group.
Generation of superposition parameters (RMS_FIT)
The item "RMS_FIT" from "DENS_MOD" menu block calls the macro rms_fit.cmds. As the number of files increases with the number of molecules in each group, it may make sense to store the rotational and translational parameters files one some other directory.
For details see MAIN_MENU:dens_mod.html.
The following options of the density modification procedure effect the macro:
[rot_tran/rt] directory for rotation and translational macros:
If you have no model (MIR case) you still need these parameters only the rotational and translational part will be probably constructed from a self rotation function, heavy atom positions or density map positions (see MAIN_DOC:intro/intro.html).
For each molecule (with a unique segment name) a separate mask is created and saved to a file. It is taken care for overlap with other molecules as well as their symmetry equivalents. For details about parameters see MAIN_MENU:dens_mod.html.
As masks based on molecular models are precalculated and saved to files, it may make sense to use some other directory to store them. Let "main_config" to create and modify the file make_masks.cmds. The following options of the density modification procedure effect the macro:
[atom/a] mask atom maximal radius: 6.0 [mask_dir/m] directory for mask files:
The dm_loop.com deals with protein region of electron density maps.
An whole averaging cycle is encrypted into the dm_loop.com macro, which you don't realy wish to edit. Normaly "main_config" or a shortcut "MAIN_CONF:menu_dens_mod.sh" should be used for its modification and creation.
Averaging parts involve copying density into the mask,
! first copy density into mask of MOLA set vari FILE_MASK = mask_MOLA.xmap read file FILE_MASK map xpl over MAP_WORK make map MAP_WORK from MAP_FROM copy
adding rotated densities to the "MOLA" mask one by one,
<mol_MOLA_to_MOLB.com <?MAIN_UTILS:add_a_map FILE_MASK MAP_ADD MAP_FROM MAP_WORK
rescaling of the resulting summ of the density maps and building the unit cell using the symmetry operators.
make map MAP_WORK rescale make map MAP_TO from MAP_WORK cell
In a case a large number of local repetitions of local subunits (more than five), it makes sense to average only one molecule and then distribute the averaged density of one monomer the others. followed by creation of a dm_loop.com, which differs from the "each case
! distribute MOLA density into mask of MOLB set vari FILE_MASK = mask_MOLB.xmap read file FILE_MASK map xpl over MAP_WORK make map MAP_WORK set 9000 100000 0.0 <mol_MOLB_to_MOLA.com <?MAIN_UTILS:add_a_map FILE_MASK MAP_ADD MAP_TO MAP_WORK make map MAP_WORK rescale make map MAP_TO from MAP_WORK cell
The strategy "one" is computationally faster than the default strategy "each" as the number of molecules within a group increases, although it is less accurate due to double interpolation errors based on interpolation of interpolated density. I recommend to use the "each" method within groups of more than 4 segments.
Auxiliary macros for model building
The "N_MOLECUL" menu block is described in MAIN_MENU:nmol.html, the here present load_4mol.com is only an example.
The idea of a working model is that you during model building modify only a single molecule and distribute the changes to the others. So you can decide which molecule can become your working model. The images of the other molecules are displayed superimosed to the working model as background images.
In this case there are only two possible working models, MOLA and MOLB. In more complicate case their number may increase and with them the number of the new menu items, but it is obvious that there are limits. It makes no sense to deal independently with 60 molecules within an icosahedron. However, "main_config" does not cover that yet.
The create_all_others.cmds macro creates from your current working model all related segments using the rotation and translational parameters as last calculated by the rms_fit.cmds procedure. The "PUT_TO" and "GET_FROM" commands will only work for topologically equivalent moelcules, whereas this one simply inforces topological and geometrical equivalence.
Distribute parts of your model from one to another segment
The "PUT_TO" and "GET_FROM" are commands that transfer coordinates, occupancies (weights) and temperature factors between equivalent parts of the working model and the other molecules. Equivalency is based of sequence ID matching between the key "active" selection of the working model and the other molecule. "PUT" and "GET" indicate direction of the data transfer.
Coordinates are transferred on a basis of a RMS superposition between the working and the other model.
Rotates maps of equivalent molecules to the map of the current working segment.
The macro "rotate_maps.cmds" is created on the fly in the moment when a work segment has been defined.
If your case belongs here, study the files what you can get with "main_config" learn the MAIN programing language. In particular the commands MAKE (MAIN_COM:make.html), REFLECT (MAIN_COM:reflect.html), FOURIER (MAIN_COM:fourier.html), ENERGY (MAIN_COM:energy.html) and SELECT (MAIN_COM:select.html) and some others.
For dealing with several crystal forms see (MAIN_DOC:2crys/2crys.html).
For use of phase extension modify resolution limits by redefining reflection key "WORK_REFL" in dm_next.cmds and then invoke about 4 cycles in dm_loop.com after each resolution range.
For use of proper symmerty you have to provide your own "rot_tran" parameter files and generate your own masks.
If you don't know how to solve your problem use E-mail.