3. ELECTRON DENSITY MANIPULATION ROUTINES

3.1. Introduction

Sometimes initial phasing with heavy atom derivative(s) or a starting
molecular model used for molecular replacement yields a clear electron
density map allowing unambiguous interpretation of a molecular
structure.  If this happens to be your case, congratulations, cheers
to your luck dear MAIN manual reader, you can skip the rest of the
chapter.  If however, your electron density is not unambiguous, I do
encourage you to read the rest.

Electron density can be manipulated by any mean that will yield a more
interpretable electron density map, finally resulting in a refined
structure of a macromolecule.

Most widely used methods are 
Electron density averaging and solvent flattening (Bricogne, Wang). 
Histograms, Sayre's equations ()
These methods
improve phases by including extra information from two sources,
the local symmetry within an asymetric unit and the demarkation of
areas occupied by protein and solvent.  This then alters the electron
density maps, from which the whole crystal cell is built and structure
factors are calculated, beginning a new cycle.  The procedure is
repeated until the apparent R-factor of the electron density map
converges (see  flow chart in FIGURE 3.1.1.).  The aim is to produce an
interpretable electron density map.

It is assumed that the areas of the electron density map where protein
lies have areas of greater positive electron density than the
solvent areas.  The solvent occupied areas inside the crystal have no
rigid structure, so they can be `flattened' (all grids points in the
solvent region obtain a single density value).

Solvent flattening, when successful, defines clear boundaries between
the solvent and protein occupied volumes and improves the electron
density.  

Electron density averaging, however, requires clear boundaries between
protein and solvent volumes, it should be almost inevitably preceded
by a solvent flattening in the case of multiple isomorphous
replacement phase evaluation.

There are two types of non-crystallographic or local symmetry:
proper (also called spherical) and improper symmetry.
Molecules of an asymetric unit are related by proper symmetry when
they can be superimposed upon each other by a single rotation about a
local symmetry axis, while in the case of improper local symmetry,
superposition of the molecules require other operations
(rotation usually combined with translation).  (FIGURE 3.1.2)
Therefore, procedures for proper and improper symmetry averaging
differ.  For proper symmetry averaging, equivalent areas do not have
to be separated, while for improper averaging it is necessary to
distinguish between them.

Besides improving the phases (and the electron density maps) of the
starting resolution range, it is possible to evaluate phases of higher
resolution reflections by gradually increasing the resolution range.
The procedure is called phase extension.  The larger the number of
molecules in an asymmetric unit, the better the results which can be
obtained with phase extension, with however the condition that the
initial phases are sufficiently correct for the procedure to
converge properly ( Bricogne, 1974, Podjarny, 1990).

To perform real space electron density averaging some initial
set of phases, equivalent areas and geometric transformations
(rotation and translation parameters) are required.

In a molecular replacement procedure, the equivalent areas are defined
from the initial model placement. Transformations between them can be
easily constructed by superimposing the molecular models (as
demonstrated in the cases of cathepsin B and riboflavin synthase).

In the case of a single or multiple isomorphous replacement procedure,
it is possible to construct the transformation parameters from the
heavy atom positions when they fulfill the local symmetry conditions.
When they do not, it is necessary to construct the rotational
parameters by an autocorrelation of a Patterson map and then to find
the center of rotation and corresponding translational components by
autocorrelating electron density (as demonstrated in the case of
carbamoylsarcosine hydrolase).  To recognize the boundaries of
equivalent areas, it is recommended first to average the initial
electron density map, whereby the uncorrelated areas are supposed to
smear out, and in this averaged density to then find the borders
(envelope) of the equivalent areas interactively at a graphical
display by going through all the layers of an asymmetric unit
(2-dimensional construction using a program such as X-CONTOUR
(Buchberger, 1990)) or by
preparing a 3-dimensional map representation of a whole asymmetric
unit and determining the equivalent parts from its 3-dimensional image
(this can be done with MAIN).  Both constructions were applied in the
case of carbamoylsarcosine hydrolase.

The theoretical basis for electron density averaging or real space
averaging was established in late 1960s and early 1970s (Rossman,
1972). Bricogne has written a review (Bricogne, 1974) of the method,
its application and limitations.  There it was shown that the
averaging in direct space is equivalent to the procedure in
reciprocal space and that averaging in direct space has advantages
over reciprocal space procedures.  The reason seems to lie in the greater
computational inaccuracies in reciprocal space calculations.  In
1976 he published a description of his program (Bricogne, 1976).
Besides Bricogne's program, other attempts have also been made, but
none of them (Johnson in Rayment et al., 1978; Nordman, 1980) 
is used so widely and in so many variations as Bricogne's.
Recently, Lawrence (Lawrence, 1991) reviewed the method and its applications
to {\it de novo} determined structures.

3.2. Programming concepts

The reason for tackling the programming of electron density averaging
routines arose from the work of my collegues when they were trying
to average the electron density of human cathepsin B with Bricogne's
(Bricogne, 1976) program package.  The procedure was connected with
many problems and errors that seemed to have no end.  Since some
routines dealing with electron density maps were already built into
MAIN, it seemed natural to include additional routines for electron
density averaging.  Programming of electron density averaging routines
was done gradually, by trying to improve and simplify already existing
procedures.

At this point it deserves to mention that the development of computer
technology has made possible the design of simpler, more easily
understood and generally applicable routines than are employed in
Bricogne's approach designed for computers of the 70th.

3.2.1. Addressing problem

Bricogne (1976) describes the difficulty of electron density averaging
as 'an addressing problem in which, at first sight, an enormous file
has to be made randomly accessible.' His solution of the problem was
the double-sorting technique.  The masked grid points are first
transformed applying all local symmetry operations to the equivalent
positions in the electron density.  Each masked grid point is stored
in a file as a record, which includes the original grid point
coordinate and transformed coordinates.  These records are then sorted
according to their transformed coordinates.  So it became possible to
arrange sequential access of electron density map layers by keeping
only two neighbouring layers at a time in program memory.  When the
transformed coordinates exceeded the layer boundaries, the next layer
has to be read in the program memory.  After all transformed points
obtained densities, they have to be sorted again according to their
original mask grid point positions.  After the second sorting all
points belonging with the same original mask address apear one after
another, so it is easy to average their density values.  This
double-sorting technique avoids the use of large parts of computer
memory on one hand, but on the other occupies much larger disk space
than electron density maps, since each grid point is not represented
anymore as a single integer or real number in a map, but as record in
which includes the original and transformed coordinates in addition to
the density.  The critical point concerning disk space requirements
occurs while sorting the records.  (Applying the double sorting
tecnique we were not be able to average electron density maps of the
monoclinic form of riboflavin synthase including reflections to 2.5A
resolution.)

In my opinion the double-sorting routines became obsolete when
operating systems with virtual memory on magnetic disks became
available (though it is still believed that the double sorting should
be used for larger structures.  (Podjarny, 1990)).  This belief is
based on a lack of understanding of modern computers.  As soon as an
electron density map fits on a disk it is physically no difference
if it is available to a program as an external file of internal array
stored in virtual memory.  For programming and, astonishingly, also for the
maximum disk storage requirements, the differences are large.
Accessing data in an internal array simplifies programming while the
double-sorting technique requires much more space on the disk than
several electron density maps.  However, abolishing the double-sorting
procedure requires another strategy.  Another reason to abolish the
double-sorting procedure is for the sake of programming and conceptual
simplicity, efficiency and generality.

The aim was therefore to break the electron density averaging
procedure into elementary operations and try to apply them
sequentially on a row of maps, only modifying a single map at a time
and never accessing more than two maps simultaneously.  Therefore at
least two maps should be kept entirely in program memory at a run
time.  In this way also access of grid points is not completely
random, since molecular envelopes are relatively contiguous regions,
so that at run time the program can access data almost as if they were
sorted.

3.2.2. How to store a map?  

A value can be stored in a computer in different
ways (FORTRAN notation is used): as a single byte or character*1, 2
bytes (integer*2), 4 bytes (integer*4, real*4), 8 bytes (integer*8,
real*8) ...  The data should be stored accurately enough to perform
the task.  For accuracy alone it seems, that single byte precision
suffices (see APPENDIX B).  (Bricogne's program stores map
values in bytes.)  The problem with single byte storage is that it
is not possible to add more than 2 maps together without danger of a
value overflow.  The double-sorting procedure solves the problem quite
elegantly.  Since equivalent points are already sorted as they
appear on the list one after another, it is possible to average
them employing a single real*4, integer*2 or integer*4 number.

In order to employ MAIN applying the same procedure, there should be
in principle enough space to store all local symmetry maps at once in
order to access them simultaneously.  Since my premise was to retain
simplicity of the expression A = A + B, other approaches had to be
found.

The first solution was that the character*1 maps used with MAIN are
averaged by an external program which can add multiple character*1
maps into a integer*2 map for later averaging.  For this, each single
file should be written to a disk.  However, in a real*4 map it is
possible to sum a nearly unlimited number of maps.  A simple
calculation shows that four character*1 maps take the same amount of
disk space as one real*4 map with the same number of grid points.
When the number of maps that need to be averaged increases, the
character*1 maps require more disk space than a single real*4 averaged
map.  Besides real*4 maps have another advantage; an external program
for adding them is not needed anymore, what in addition reduces the
input and output operations.  In MAIN now both options are available.
MAIN can deal with character*1 and with real*4 maps.  Their use in
averaging is demonstrated in the cathepsin B case (see APPENDIX C).  
For the averaging of the riboflavin synthase icosahedral structure 
only the real*4 maps were applied.

3.2.3. Molecular envelopes

In the case of proper averaging one molecular envelope (mask) suffices
for all molecules, while for improper averaging the procedure should
distinguish between different molecular areas.  For this reason the
concept of labeled masks was introduced by Bricogne.  In MAIN this is
solved by storing each molecule's mask in a separate map.  Since on
each map only a single operation can be performed at a time, there are
no principle differences between improper and proper averaging
procedures.

3.2.4. Crystal cell generation

The usual way to generate a complete crystal cell from an asymmetric
unit was to write a procedure that applies the building rules.
Building rules tell which grid point in a cell is equivalent to which
one from the asymmetric unit.  This approach has several drawbacks
which can turn averaging (and solvent flattening as well) to a
complicated procedure full of errors.  First, for each different space
group, different rules should be applied and second, in almost any
space group it is possible to choose different definitions of an
asymmetric unit.  The third complication sometimes arises from
differing numbers of grid points in a unit cell.  These routines are,
however, not available for every possible case and have to be, when
necessary, programmed.  Also when they are available their correctness
should be verified for each single case.  It happened quite often that
there were whole empty layers left.  At this point real problems may
begin specially for an unskilled programmer.  Therefore in MAIN
another approach is applied: A crystal cell is generated from an
asymmetric unit by applying crystal symmetry operations.  The
consequence of this generality is that there are no limitations on
placement of an asymmetric unit and the routines work generally for
any space group.  Besides the asymetric unit may consist of several
independent parts, each one stored in a separate map.  The MAIN
approach has also another advantage: since only empty points can be
modified, there is no danger of having multiple density in certain
regions.

Unfortunately in cyclic electron density averaging routines
different programs should still be applied.  When connecting them into
an automatic procedure, the simplicity of MAIN syntax is not retained.
This problem will be solved in the near future when fast Fourier
transformation routines will be integrated into MAIN.  With further
development of computers enough core memory will become available, so
that it will be possible to hold in memory whole maps and reflection
data and so reduce the input/ouput operations to a minimum.


3.2.5. How to deal with electron density maps when using MAIN?

A map is a 3-dimensional array of grid points, each with a value.
According to the value, they are treated as empty, density or mask
points.  The grid points with values inside the density interval are
the density points, the ones with values below the density interval
are empty points, and the ones above are mask points.  In the case of
character*1 maps 0 is an empty point and 255 is a masked point.  In
the case of real*4 maps the empty points have values below -9999.0 and
masked above 9999.0.  The region inbetween comprises the density points.

Each map has size, starting coordinates and cell constants (cell
constants are needed for transforming maps from differents cells).
Grid points lying in different unit cells with the same fractional
coordinates are identical.  That means that, when a whole unit cell is
defined in a map, the program can expand the density through the whole
space.

There are 6 elementary operations types that can be done with maps:
- Creation of a map,
- Creation and extension of a mask,
- Rotations and translations of a map,
- Building the crystal unit cell from an asymmetric unit or its parts,
- Setting values to selected grid points,
- Scaling a single map with a constant and adding two maps.

The smallest map contains only a header where cell constants and
crystal symmetry operations and number of grid points along the cell
axes are stored.  MAIN can read PROTEIN as well as its own native
formats of electron density maps.  The maps can be written in Lyn Ten
Eyck and MAIN native formats.  The native formats are ASCII files with
record length 80 so that they can be edited and changed with a text
editor.  Besides a variety of smaller conversion programs were written
to enable conversion of maps between PROTEIN, X-PLOR, P1SF, FRODO and
native MAIN formats.

Operations on a map grid point can be applied when the point is
empty or masked.  (The exception is the SET command that can set a
value to a grid point in any specified range.)

A new map can be created from an already existing one by taking its
cell constants and size or from scratch.  The map size, origin, and
number of grid points per cell length and cell constants can be taken
from an already existing map.  The grid points are initialized to a
specified value.  The map origin and size can also be defined from an
atom selection so that selected atoms plus some boundary grid points
lie inside it.

Mask points can be defined in several ways:
- By setting all grid points with their values in specified range
to the mask value.
- By a distance criteria from a selection of atomic center positions
- By conversion of grid points to real space points and then to atoms
and further to mask grid points.
- By converting unmasked grid points that lie between masked points
to masked points.

A map can be transformed (rotated, translated or copied) into the
mask points of another map by linear interpolation.  This is done
so that the position of a mask point is transformed into the space
of the map with density.  The masked point density value is then
obtained by interpolation from the eight surrounding grid points.

Empty points of a map can be filled with values of the density
points of another map by applying crystal symmetry operations.
This routine is independent of crystal symmetry. In order
to find the position of the grid point into which the density
value should be copied rotation matrices and translation vectors
are applied.

3.3. Applications

The three applications most important for programming of electron
density averaging routines are described below.  For the first time a
complete cyclic averaging procedure was applied when averaging
electron density of rat cathepsin-B.  The rat cathepsin-B data were
used later during program development for testing the source code.
Further program development was necessary because the monoclinic form
of the riboflavin synthase unit cell with an initial 1.0A grid size
consists of more than 7 million grid points and the "fast" procedure
applying integer*1 (character*1) maps didn't allow any phase
extension.  When averaging carbamoylsarcosine hydrolase, routines for
mask generation were significantly enhanced and auxiliary programs
were improved to enable automatic phase extension.


3.3.1. Cathepsin-B

The lysosomal cysteine proteinases play an important role in
intracellular protein degradation (see Barrett et al., 1988). Of these
proteinases, cathepsin-B is the most abundant and the most thoroughly
studied. Besides its involvement in intracellular protein turnover, it
has been implicated in tumor metastasis and in other disease states.
cathepsin-B exhibits optimal activity in slightly acidic media and is
irreversibly inactivated at alkaline pH-values.  It acts as an
endopeptidase with relatively broad specificity and a has slight
preference for basic residues or phenylalanine at P2 (using the
nomenclature of Schechter and Berger, 1967). Bulky side chains at P1
are disfavoured see Shaw et al., 1990).  A remarkable feature of
cathepsin B is its distinctive peptidyl dipeptidase activity (Aronson
and Barrett, 1978; Bond and Barrett, 1980; Takahashi et al., 1986;
Polgar and Csoma, 1987) at the carboxy terminus.  cathepsin-B is
inhibited by tipical cystein proteinase protein inhibitors such as
cystatins and stefins (see Biol. Chem. Hoppe-seyler 371, Suppl).

The complete amino acid sequences of rat (Takio et al., 1983), human
(Ritonja et al., 1985) and bovine (Meloun et al., 1988) cathepsin-B
and the partial sequence of the porcine (Takahashi et al., 1984)
cathepsin-B have been communicated.  According to the nucleotide
sequences (Chan et al., 1986; Fong et al., 1986; Ferrara et al.,
1990), cathepsin-B from human, rat or mouse is synthesized as a 339
amino acid residues polypeptide chain, which is processed to the
mature single-chain molecule of 254 amino acid residues. In mammalian
tissues, most of the active cathepsin-B is found as a two-chain
molecule consisting of 47 (or 49) and 205 (or 204) residue polypeptide
chains (light and heavy chain) covalently cross-linked by a disulfide
bridge.

The cathepsin-B sequence indicates a close structural homology with
the plant proteinase papain (Takio et al., 1983).  Comparisons of
the sequences of cathepsins -L and -H with those of papain and
actinidin resulted in alignment proposals for cathepsin-B (Takio et
al., 1983; Kamphuis et al., 1985).  Based on the 3-dimensional
structures of papain (Kamphuis et al., 1984) and actinidin (Baker,
1980), the common structural features as well as sites of insertions
and deletions were made more precise (Kamphuis et al., 1985; Baker &
Drenth, 1987). cathepsin-B is considerably larger than papain or actinidin,
and the acomodation of some of the longer polypeptide insertions and
the arrangment of the active site residues remained unclear.
A clear understanding of its specificity and of its catalytic
properties requires the availability of an experimental structure as
provided by X-ray crystallography.  

Human and rat liver cathepsin-B are the first crystallographically
determined structures of lysosomal cysteine proteinases (Musil et al.,
1991; Zucic et al., 1992).  They are structurally related to cysteine
proteinases of plant origin papain and actinidin.  The monoclinic
crystals of both proteins had P21 symmetry, though quite different
cell constants (human cathepsin-B: a= 86.23A, b= 34.16A, c= 85.56A,
$\beta$= 102.9o ; rat cathepsin-B a= 59.98A, b= 128.01A, c= 59.12A,
$\beta$= 121.47o ).  The human cathepsin-B crystallized with two
molecules per asymmetric unit in a quasi-tetragonal form and the rat
cathepsin-B with three molecules per asymmetric unit in an almost
perfect hexagonal form (FIGURES 3.3.1.1 and 2).

The structures were solved by a molecular replacement procedure, using
a molecular model based on papain and actinidin structures.  It merit
to mention that approximate position of the second molecule in the
human cathepsin-B crystals was deduced from heavy atom positions.

First we tried to refine the structure of human cathepsin-B.  The
electron density inside the mask of each separate molecule (FIGURE
3.3.1.3) was averaged applying the improper symmetry operations as
obtained by superposition of molecular models.  The cyclic averaging
procedure (including iterative fourier transformations of the density
into structure factors and back) wasn't applied, since we assumed that
averaging of two molecules does not suffice to improve the phases.
Unfortunately the model could not be refined bellow an R-factor of
0.30.

However, the current model of human cathepsin-B was applied to solve
the structure of rat cathepsin-B.  After successful rotational and
translational search, the models were crystallographically refined and
the residues adjusted to rat cathepsin-B sequence.  The resulting
electron density was averaged over all three molecules within a cyclic
procedure.  CHAR_LONG procedure was applied.  The procedure is
described in detail in APPENDIX C.  In the resulting electron
density map the loop 129 ...  140 could be immediately traced (FIGURE
3.3.1.4 and 5).  Afterwards the electron density averaging procedure
was applied as long as the molecular models during course of refinement
didn't start to diverge from each other.


3.3.2. Riboflavin synthase 

Riboflavin synthase is enzyme active in final steps of 
riboflavin (vitamin B2) synthesis (review M\"uller et al., 1988).  
See FIGURE 3.3.2.1.  Riboflavin is synthetised in microorganisms 
and plants.  Heavy riboflavin synthases from \it Bacillus subtillis 
\rm is a complex of two
enzymes quite different in their molecular weight.  
The complex consist of three
$\alpha$-subunits and 60 $\beta$-subunits.  Actually the $\alpha$
subunit, and not the $\beta$ subunit, is catalyzing the final 
step in riboflavin synthesis.  Therefore the appropriate name
for $\beta$ subunit should be lumazine synthase and not riboflavin synthase.
The complete sequences of $\beta$-subunit (Ludwig et al., 1987) and
$\alpha$-subunit (Schott et al., 1990a) have been communicated.  The
crystal structure of heavy riboflavin synthase (Ladenstein et al.,
1988) has shown that the enzyme forms an icosahedral capsid consisting
of 60 $\beta$ subunits.  The investigated hexagonal crystals belonged
to P6322 symmetry group with cell constants a=b= 156.4A, c= 298.5A and
$\gamma$ =120o (Ladenstein et al., 1983) with 10 $\beta$-subunits in an
asymmetric unit.  That structure had an R-factor of 0.399 at 3.3A resolution.

Later, lumazine synthase-riboflavin synthase complex was decomposed
into subunits, and its icosahedral capsid, consisting of $\beta$
subunits only could be rebuilt and three crystal forms of "riboflavin
synthase" were communicated (Schott et al., 1990b): A monoclinic
modification belonged to space group C2 with cell constants of
a=235.5A, b= 191.2A, and c= 165.4A and $\beta$= 134.5o and 30
molecules per asymmetric unit.  The crystals diffracted to 2.8A
resolution.  A hexagonal form belonged to space group P6322 with cell
constants of a=b=157.2, c= 300.8A and $\gamma$= 120o with 10 molecules
per asymmetric unit similar to the heavy riboflavin hexagonal form.
FIGURE 3.3.2.2 shows the spatial distribution of all 60 molecules.

This new hexagonal form of $\beta$ subunit was refined further to 
an R-factor of about 0.32 (Ladenstein, unpublished results).  This 
model was used in
electron density averaging and refinement of the monoclinic form, for
which data to 2.45A resolution were collected. The structure was
subjected to rigid body refinement applying X-PLOR including reflection
data to 3.5A resolution (Ritsert, unpublished results).  At this stage
I joined the project by adapting MAIN routines for electron
density manipulation to handle electron density maps of large crystal
cells.

First only the array sizes have been changed and the CHAR_FAST
procedure as described in APPENDIX C was applied.  However, the
R-factor of the averaged map did not converge.  Therefore, the procedure was
reexamined and programmed with many modifications.  Finally, the
REAL_LONG procedure enabled us to start with a successful electron
density averaging procedure at 3.0A resolution gradually expanding the
phases to 3.0A.  Then the grid size was changed from 1.0A to 0.8A
and phase extension continued until reflections to 2.45A
resolution were included.  The procedure was essentially the same as the
cathepsin-B REAL_LONG procedure.  The only significant difference
introduced was that maps were added immediately after being
transformed.  They were not first stored on a disk and afterwards
averaged in a separate procedure (AVER_MAPS.COM).  During electron
density averaging of "riboflavin synthase" proper local symmetry 
operations were applied.
The model was adapted to the resulting averaged density,
crystallographically refined and new electron density maps were
calculated by further extending the phases via an electron density
averaging procedure.  This procedure was repeated several times until
all reflections were included. The current R-factor of the model is
0.23 including data to 2.45A resolution.  FIGURE 3.3.2.3 shows an
averaged 2Fo-Fc electron density map.  The complete description of 
the monoclinic form refinement will be presented by Ritsert et al..


3.3.3. Carbamoylsarcosine hydrolase


N-Carbamoylsarcosine amidohydrolase (CSHase, EC 3.5.1.59) catalyses
the hydrolysis of N-carbamoylsarcosine to sarcosine with liberation of
carbon dioxide and ammonia (see FIGURE 3.3.3.1).

The enzyme has been found as part of a novel metabolic pathway for the
degradation of creatinine to glycine via N-methylhydantoin,
N-carbamoylsarcosine and sarcosine (Deeg et al., 1982, EP 0112571;
Yamada et al., 1985; Kim et al., 1986, 1987; Shimizu et al., 1989;
Siedel et al., 1988).

It has been found in various microorganisms ( Shimizu et al.,
1989) and was isolated and purified from \it Pseudomonas putida 77 \rm 
(Kim et al., 1986) and \it Arthrobacter sp. \rm (Siedel et al., 1988).  This
enzyme is highly specific for the degradation of N-carbamoylsarcosine
to yield sarcosine.

Here only a brief review of applied methods without structure
description is presented in order to manifest usage of MAIN in this
particular case.  The complete work is described by Rom\~{a}o et al. (1992).

3.3.3.1. Introduction

The crystals of carbamoylsarcosine hydrolase were obtained from the cloned
gene.  The crystals diffract beyond 
3.0A resolution and belong to the monoclinic space group C2, with cell
dimensions  a= 136.22A, b= 122.29A, c= 70.87A, $\beta$= 91.82o.
The self-rotation function of the Patterson map was used to search for
local two-fold axes, employing PROTEIN search routines.  
The peak at
$\psi$=0o, $\phi$=0o corresponds to the crystallographic b axis.  The
other large peaks indicate local diads relating the subunits within
the tetramer.  Peaks show up for polar angles ($\kappa$=180o ) $\psi$=90o,
$\phi$=82o; $\psi$=90o, $\phi$=172o; $\psi$=45o, $\phi$=172o; $\psi$=46o,
$\phi$=352o, with correlation values of
0.583, 0.575, 0.521 and 0.503 respectively, relative to the origin peak
(see FIGURE 3.3.3.2).

From crystal density measurements and auto-correlation of the native
Patterson map, it became evident that there are four molecules of
carbamoylsarcosine hydrolase per asymmetric unit.
Since there was no molecular model of a related enzyme available,
heavy atom derivatives had to be prepared.  Combination of two
uranium-, rhodium- and mercury- and osmium-derivative phase sets served
in the calculation of the first 3.0A resolution map.  Phases were weighted
by the figure of merit.
The obtained map was noisy and no secondary structural elements or 
molecular boundaries could be recognized. The m.i.r.
phases were then modified by solvent-flattening at 3.0A resolution.
The density in the solvent regions was set to zero (Wang, 1985) using
programs of M.Schneider.  The unit cell was sampled at 130x120x70 grid
points.  The radius of the averaging sphere was 9A and the solvent
level was adjusted to 0.51.

The modified electron density was Fourier transformed, and the resulting
phases were combined with m.i.r.  phases by applying the phase combination
procedure from Hendrickson and Lattman (1970).  Seven cycles of such
calculations were performed until convergence (R=0.233).  The quality
of the solvent-flattened density map allowed us to
define the boundaries of the tetramer in one asymmetric unit.  However,
polypeptide chains could still not be identified.


3.3.3.2. Determination of the local symmetry of the tetramer

The presence of four crystallographically independent subunits in one
asymmetric unit allows averaging of the electron density and
improvement in the quality of the final map.  In order to perform this
calculation, we needed to know the exact orientation and position of
the local symmetry axes.  Their orientations were obtained from the
self-rotation function, although the
intramolecular and crystal symmetry-generated axes were still 
ambiguos. The correct
position of the rotational axes was found from the electron density
as follows:

In the first step, a molecular envelope of an asymmetric unit  was
defined in the solvent-flattened density, using the program X-CONTOUR
(Buchberger,1991).  The density inside the selected envelope was
placed in an oversize P1 cell (204x183x105A) in order to avoid
intermolecular contacts.  This cell was Fourier-transformed, and,
using the newly calculated structure factors, a Patterson synthesis
(now of a single asymmetric unit) was performed.  As before, a
self-rotation function of the Patterson map was calculated.  The
obtained solutions were consistent with the previously
determined orientation of the non-crystallographic axes.  The presence
of a peak  corresponding to the crystallographic diad b indicated that
the selected envelope still included crystallographically equivalent
parts of another asymmetric unit.

To position correctly the local symmetry axes in the asymmetric unit,
a translation function for each of the four possible local axes was
calculated using real-space routines of PROTEIN.

The peaks of electron density selected inside the mask were rotated
about each of the local axes and translated in small increments with
respect to the unchanged m.i.r.  density.  The calculated correlation
function indicated maxima for the best positioning of the three axes
inside the asymmetric unit.  This calculation showed the three genuine
intramolecular local axes, while for the fourth axis, defined by the
polar angles  $\psi$=90o, $phi$=172o, $\kappa$=180o, no maximum was found;
this axis is generated by the crystallographic b axis and the local
diad at $\psi$=90o, $\phi$=82o, $\kappa$=180o (see FIGURE 3.3.3.2).

The orientations and positions of the three  local diads were refined
with the final results indicating three mutually
perpendicular two-fold axes of symmetry : axis number (1), is 6o
away from the c-axis of the crystal,  while the two other two axes 
(2) and (3) make angles of 45o with the crystal b-axis.

Since the center of rotation about each of the local symmetry axes was
in the lower half of the masked area, and since molecular boundaries
were not clearly recognized in regions where crystallographically
equivalent molecules came into contact, the molecular
envelope had to be improved.

The solvent-flattened density, placed inside the current envelope, was
put in the large P1 cell.  This density was averaged by applying the
local symmetry operations, expecting that density areas not
belonging to the same asymmetric unit should smear out.  With the
transformed averaged density,  a second, more clearly defined, envelope
was  produced.  The local symmetry operations were again determined as
described above.  The self-rotation function of the Patterson map,
calculated as before, confirmed the three local axes as major peaks,
now sharper as in the case of the first envelope.  The following
rotational and translational search gave a more correct orientation
and position for each of the non-crystallographic axes.  Averaging of
the electron density inside the chosen asymmetric unit was now
possible.


3.3.3.3. Initial averaging with ideal 222 symmetry

With the new envelope and new symmetry operations, the first averaged
map was calculated.  The solvent-flattened electron density was
averaged inside the mask by applying ideal 222 symmetry.  Afterwards,
the whole unit cell was generated and its density Fourier-transformed.
The Fourier transformations were carried out using programs in
PROTEIN.  The procedure was repeated until convergence of the electron
density R-value, which dropped from 0.44 to 0.29 after 7 cycles of
averaging.  When comparing the first averaged map with the one
resulting after 7 cycles of the averaging procedure, it was obvious that
cyclic averaging did not improve the map, suggesting  that the
asymmetric unit does not fulfill  ideal 222 symmetry.  The first
averaged map, however, was markedly improved in comparison to the
original m.i.r.  map, but still only a few secondary structural
elements (two $\alpha$-helical segments and some $\beta$-strands)
could be recognized.  Since model building could not proceed, the map
has to be further improved.


3.3.3.4. Proper-improper symmetry averaging

Using MAIN, a simplified representation of the  solvent flattened
density corresponding to one selected asymmetric unit was displayed
on a PS300 Evans & Sutherland graphic system.  We observed that the
original masked region could be split into two separate (upper and
lower)  parts (see FIGURE 3.3.3.3), suggesting the possibility to use improper
averaging.  For each of the individual regions, a new envelope  was
defined using MAIN.  A self-rotation function was calculated for the
density inside each envelope, confirming axis 3 (see FIGURE 3.3.3.2).
Axes 1 and 2,
therefore, have to be located in the plane separating both identified halves of
the asymmetric unit (see FIGURE 3.3.3.3 ).  The positions of the local axes
were then optimized by an interactive translational search procedure
followed by combined rotational and translational gradient
optimization using MAIN.  First, the position and orientation of an
ideal two-fold axis (axis 3) for the upper and lower halves was optimized.
The correlations of the maxima were 0.167 and 0.176 for the upper and
lower half respectively; the autocorrelation values were 1.0.  The
density inside each half was then averaged by applying the obtained
parameters for proper two-fold averaging.  Afterwards, both halves
with averaged densities were superimposed by rotations about  axes 1
and 2 and  the correlations maximized.  In these calculations, ideal
two-fold symmetry was no longer maintained.  Four additional
transformations were obtained - two for the  superposition of the
averaged upper half density to the averaged lower half rotated about
axes 1 and 2, and two for the reverse transformations.  The maximal
correlations obtained were in all four cases higher than 0.3.

These parameters were then applied in a cyclic averaging procedure
combining proper and improper averaging.  Upper and lower halves were
first averaged by applying proper symmetry (axis 3), and the averaged
halves were then averaged with improper symmetry relations (axes 1 and
2).  Improper averaging was done by transforming the averaged density
of the lower half about both axes 1 and 2 to the upper half region.
The averaged upper half and both transformed lower half density maps
were then added together and averaged.  The analogous procedure was
applied for the lower half.

The density of the complete crystal cell was generated by applying
crystal symmetry operations to each averaged half separately.  The
resulting cell was Fourier-transformed.  This procedure was repeated
in cycles and converged after 8 cycles of proper-improper symmetry
averaging (R factor = 0.43-0.25, 20-3A).  The resulting electron
density map was markedly improved in comparison to the map obtained
after ideal 222 symmetry averaging.  Many segments of the main chain
could now be traced and were built as a polyalanine chain (using
the program system FRODO (Jones, 1978) on a PS300 Evans & Sutherland
graphic system).  About 60% of the total number of residues were
built in as unconnected segments and the tetramer was generated using
the previous local symmetry operations.


3.3.3.5. Improper symmetry averaging

With these partial models of the four subunits, masks for each of them
could be defined using MAIN.  Intermolecular contacts were taken into
account in the program in order to avoid overlap of the masks.
Optimization of rotational and translational parameters between the
four density areas was repeated.  At this stage, positioning of each
model was first optimized in the electron density.  Molecules A and D
remained in their positions, while molecules B and C were slightly
moved.  All four molecular models were then superimposed by minimizing
the r.m.s distances between equivalent atoms.  12 new rotational
matrices and translational vectors were thus determined.  These new
local symmetry operations enabled us to apply the non-ideal, improper
symmetry averaging to the tetrameric asymmetric unit.

In the first stages of improper symmetry averaging, averaging was
performed still without including phases from the partial model but
the masks were expanded according to the `growth' of the molecular
model.  The solvent-flattened density was four times averaged,
independently for each subunit of the tetramer.  The cell was
reconstructed from the four averaged densities and Fourier
back-transformed.  The new density was again averaged and the
procedure was repeated.  It converged after 12 cycles of averaging,
with R=0.42-0.24.

With these new density maps, the model could be further improved, and
its calculated phases were then combined with the original m.i.r.
phases ( Hendrickson & Lattman procedure).  Parts of the model which
were considered questionable were omitted from the phase calculation.
Model phases were weighted using Sim (1959) formulas.  The whole
averaging procedure was then carried out as follows:

The model was built in one of the subunits of the tetramer.  The other
three subunits were generated by applying the local- symmetry
operations.  The whole tetramer was refined with the program X-PLOR
(Br\"unger et al.  1989). 
With the refined model, four new masks were re-determined and new
local symmetry parameters re-calculated.  M.i.r.  phases were combined
with phases from the refined model and an electron density map was
calculated.  Cyclic averaging was then applied to this initial map
using the new masks and symmetry operations.  After convergence
the whole model was re-evaluated and refit to the electron
density on the graphics system.  Fitting was checked also against the
m.i.r.  map.  After several rounds of model building, crystallographic
refinement, phase combination and electron density averaging, the 
electron density slowly improved.

FIGURES 3.3.3.4 shows the improvement  of the electron density maps.
The final R-factor for 56641 reflections between 10.0 and 2.0A resolution 
and 8304 atoms of the tetramer is 0.186.

The procedure, stressing data manipulations done with MAIN, is
described in detail in APPENDIX D.