3. ELECTRON DENSITY MANIPULATION ROUTINES 3.1. Introduction Sometimes initial phasing with heavy atom derivative(s) or a starting molecular model used for molecular replacement yields a clear electron density map allowing unambiguous interpretation of a molecular structure. If this happens to be your case, congratulations, cheers to your luck dear MAIN manual reader, you can skip the rest of the chapter. If however, your electron density is not unambiguous, I do encourage you to read the rest. Electron density can be manipulated by any mean that will yield a more interpretable electron density map, finally resulting in a refined structure of a macromolecule. Most widely used methods are Electron density averaging and solvent flattening (Bricogne, Wang). Histograms, Sayre's equations () These methods improve phases by including extra information from two sources, the local symmetry within an asymetric unit and the demarkation of areas occupied by protein and solvent. This then alters the electron density maps, from which the whole crystal cell is built and structure factors are calculated, beginning a new cycle. The procedure is repeated until the apparent R-factor of the electron density map converges (see flow chart in FIGURE 3.1.1.). The aim is to produce an interpretable electron density map. It is assumed that the areas of the electron density map where protein lies have areas of greater positive electron density than the solvent areas. The solvent occupied areas inside the crystal have no rigid structure, so they can be `flattened' (all grids points in the solvent region obtain a single density value). Solvent flattening, when successful, defines clear boundaries between the solvent and protein occupied volumes and improves the electron density. Electron density averaging, however, requires clear boundaries between protein and solvent volumes, it should be almost inevitably preceded by a solvent flattening in the case of multiple isomorphous replacement phase evaluation. There are two types of non-crystallographic or local symmetry: proper (also called spherical) and improper symmetry. Molecules of an asymetric unit are related by proper symmetry when they can be superimposed upon each other by a single rotation about a local symmetry axis, while in the case of improper local symmetry, superposition of the molecules require other operations (rotation usually combined with translation). (FIGURE 3.1.2) Therefore, procedures for proper and improper symmetry averaging differ. For proper symmetry averaging, equivalent areas do not have to be separated, while for improper averaging it is necessary to distinguish between them. Besides improving the phases (and the electron density maps) of the starting resolution range, it is possible to evaluate phases of higher resolution reflections by gradually increasing the resolution range. The procedure is called phase extension. The larger the number of molecules in an asymmetric unit, the better the results which can be obtained with phase extension, with however the condition that the initial phases are sufficiently correct for the procedure to converge properly ( Bricogne, 1974, Podjarny, 1990). To perform real space electron density averaging some initial set of phases, equivalent areas and geometric transformations (rotation and translation parameters) are required. In a molecular replacement procedure, the equivalent areas are defined from the initial model placement. Transformations between them can be easily constructed by superimposing the molecular models (as demonstrated in the cases of cathepsin B and riboflavin synthase). In the case of a single or multiple isomorphous replacement procedure, it is possible to construct the transformation parameters from the heavy atom positions when they fulfill the local symmetry conditions. When they do not, it is necessary to construct the rotational parameters by an autocorrelation of a Patterson map and then to find the center of rotation and corresponding translational components by autocorrelating electron density (as demonstrated in the case of carbamoylsarcosine hydrolase). To recognize the boundaries of equivalent areas, it is recommended first to average the initial electron density map, whereby the uncorrelated areas are supposed to smear out, and in this averaged density to then find the borders (envelope) of the equivalent areas interactively at a graphical display by going through all the layers of an asymmetric unit (2-dimensional construction using a program such as X-CONTOUR (Buchberger, 1990)) or by preparing a 3-dimensional map representation of a whole asymmetric unit and determining the equivalent parts from its 3-dimensional image (this can be done with MAIN). Both constructions were applied in the case of carbamoylsarcosine hydrolase. The theoretical basis for electron density averaging or real space averaging was established in late 1960s and early 1970s (Rossman, 1972). Bricogne has written a review (Bricogne, 1974) of the method, its application and limitations. There it was shown that the averaging in direct space is equivalent to the procedure in reciprocal space and that averaging in direct space has advantages over reciprocal space procedures. The reason seems to lie in the greater computational inaccuracies in reciprocal space calculations. In 1976 he published a description of his program (Bricogne, 1976). Besides Bricogne's program, other attempts have also been made, but none of them (Johnson in Rayment et al., 1978; Nordman, 1980) is used so widely and in so many variations as Bricogne's. Recently, Lawrence (Lawrence, 1991) reviewed the method and its applications to {\it de novo} determined structures. 3.2. Programming concepts The reason for tackling the programming of electron density averaging routines arose from the work of my collegues when they were trying to average the electron density of human cathepsin B with Bricogne's (Bricogne, 1976) program package. The procedure was connected with many problems and errors that seemed to have no end. Since some routines dealing with electron density maps were already built into MAIN, it seemed natural to include additional routines for electron density averaging. Programming of electron density averaging routines was done gradually, by trying to improve and simplify already existing procedures. At this point it deserves to mention that the development of computer technology has made possible the design of simpler, more easily understood and generally applicable routines than are employed in Bricogne's approach designed for computers of the 70th. 3.2.1. Addressing problem Bricogne (1976) describes the difficulty of electron density averaging as 'an addressing problem in which, at first sight, an enormous file has to be made randomly accessible.' His solution of the problem was the double-sorting technique. The masked grid points are first transformed applying all local symmetry operations to the equivalent positions in the electron density. Each masked grid point is stored in a file as a record, which includes the original grid point coordinate and transformed coordinates. These records are then sorted according to their transformed coordinates. So it became possible to arrange sequential access of electron density map layers by keeping only two neighbouring layers at a time in program memory. When the transformed coordinates exceeded the layer boundaries, the next layer has to be read in the program memory. After all transformed points obtained densities, they have to be sorted again according to their original mask grid point positions. After the second sorting all points belonging with the same original mask address apear one after another, so it is easy to average their density values. This double-sorting technique avoids the use of large parts of computer memory on one hand, but on the other occupies much larger disk space than electron density maps, since each grid point is not represented anymore as a single integer or real number in a map, but as record in which includes the original and transformed coordinates in addition to the density. The critical point concerning disk space requirements occurs while sorting the records. (Applying the double sorting tecnique we were not be able to average electron density maps of the monoclinic form of riboflavin synthase including reflections to 2.5A resolution.) In my opinion the double-sorting routines became obsolete when operating systems with virtual memory on magnetic disks became available (though it is still believed that the double sorting should be used for larger structures. (Podjarny, 1990)). This belief is based on a lack of understanding of modern computers. As soon as an electron density map fits on a disk it is physically no difference if it is available to a program as an external file of internal array stored in virtual memory. For programming and, astonishingly, also for the maximum disk storage requirements, the differences are large. Accessing data in an internal array simplifies programming while the double-sorting technique requires much more space on the disk than several electron density maps. However, abolishing the double-sorting procedure requires another strategy. Another reason to abolish the double-sorting procedure is for the sake of programming and conceptual simplicity, efficiency and generality. The aim was therefore to break the electron density averaging procedure into elementary operations and try to apply them sequentially on a row of maps, only modifying a single map at a time and never accessing more than two maps simultaneously. Therefore at least two maps should be kept entirely in program memory at a run time. In this way also access of grid points is not completely random, since molecular envelopes are relatively contiguous regions, so that at run time the program can access data almost as if they were sorted. 3.2.2. How to store a map? A value can be stored in a computer in different ways (FORTRAN notation is used): as a single byte or character*1, 2 bytes (integer*2), 4 bytes (integer*4, real*4), 8 bytes (integer*8, real*8) ... The data should be stored accurately enough to perform the task. For accuracy alone it seems, that single byte precision suffices (see APPENDIX B). (Bricogne's program stores map values in bytes.) The problem with single byte storage is that it is not possible to add more than 2 maps together without danger of a value overflow. The double-sorting procedure solves the problem quite elegantly. Since equivalent points are already sorted as they appear on the list one after another, it is possible to average them employing a single real*4, integer*2 or integer*4 number. In order to employ MAIN applying the same procedure, there should be in principle enough space to store all local symmetry maps at once in order to access them simultaneously. Since my premise was to retain simplicity of the expression A = A + B, other approaches had to be found. The first solution was that the character*1 maps used with MAIN are averaged by an external program which can add multiple character*1 maps into a integer*2 map for later averaging. For this, each single file should be written to a disk. However, in a real*4 map it is possible to sum a nearly unlimited number of maps. A simple calculation shows that four character*1 maps take the same amount of disk space as one real*4 map with the same number of grid points. When the number of maps that need to be averaged increases, the character*1 maps require more disk space than a single real*4 averaged map. Besides real*4 maps have another advantage; an external program for adding them is not needed anymore, what in addition reduces the input and output operations. In MAIN now both options are available. MAIN can deal with character*1 and with real*4 maps. Their use in averaging is demonstrated in the cathepsin B case (see APPENDIX C). For the averaging of the riboflavin synthase icosahedral structure only the real*4 maps were applied. 3.2.3. Molecular envelopes In the case of proper averaging one molecular envelope (mask) suffices for all molecules, while for improper averaging the procedure should distinguish between different molecular areas. For this reason the concept of labeled masks was introduced by Bricogne. In MAIN this is solved by storing each molecule's mask in a separate map. Since on each map only a single operation can be performed at a time, there are no principle differences between improper and proper averaging procedures. 3.2.4. Crystal cell generation The usual way to generate a complete crystal cell from an asymmetric unit was to write a procedure that applies the building rules. Building rules tell which grid point in a cell is equivalent to which one from the asymmetric unit. This approach has several drawbacks which can turn averaging (and solvent flattening as well) to a complicated procedure full of errors. First, for each different space group, different rules should be applied and second, in almost any space group it is possible to choose different definitions of an asymmetric unit. The third complication sometimes arises from differing numbers of grid points in a unit cell. These routines are, however, not available for every possible case and have to be, when necessary, programmed. Also when they are available their correctness should be verified for each single case. It happened quite often that there were whole empty layers left. At this point real problems may begin specially for an unskilled programmer. Therefore in MAIN another approach is applied: A crystal cell is generated from an asymmetric unit by applying crystal symmetry operations. The consequence of this generality is that there are no limitations on placement of an asymmetric unit and the routines work generally for any space group. Besides the asymetric unit may consist of several independent parts, each one stored in a separate map. The MAIN approach has also another advantage: since only empty points can be modified, there is no danger of having multiple density in certain regions. Unfortunately in cyclic electron density averaging routines different programs should still be applied. When connecting them into an automatic procedure, the simplicity of MAIN syntax is not retained. This problem will be solved in the near future when fast Fourier transformation routines will be integrated into MAIN. With further development of computers enough core memory will become available, so that it will be possible to hold in memory whole maps and reflection data and so reduce the input/ouput operations to a minimum. 3.2.5. How to deal with electron density maps when using MAIN? A map is a 3-dimensional array of grid points, each with a value. According to the value, they are treated as empty, density or mask points. The grid points with values inside the density interval are the density points, the ones with values below the density interval are empty points, and the ones above are mask points. In the case of character*1 maps 0 is an empty point and 255 is a masked point. In the case of real*4 maps the empty points have values below -9999.0 and masked above 9999.0. The region inbetween comprises the density points. Each map has size, starting coordinates and cell constants (cell constants are needed for transforming maps from differents cells). Grid points lying in different unit cells with the same fractional coordinates are identical. That means that, when a whole unit cell is defined in a map, the program can expand the density through the whole space. There are 6 elementary operations types that can be done with maps: - Creation of a map, - Creation and extension of a mask, - Rotations and translations of a map, - Building the crystal unit cell from an asymmetric unit or its parts, - Setting values to selected grid points, - Scaling a single map with a constant and adding two maps. The smallest map contains only a header where cell constants and crystal symmetry operations and number of grid points along the cell axes are stored. MAIN can read PROTEIN as well as its own native formats of electron density maps. The maps can be written in Lyn Ten Eyck and MAIN native formats. The native formats are ASCII files with record length 80 so that they can be edited and changed with a text editor. Besides a variety of smaller conversion programs were written to enable conversion of maps between PROTEIN, X-PLOR, P1SF, FRODO and native MAIN formats. Operations on a map grid point can be applied when the point is empty or masked. (The exception is the SET command that can set a value to a grid point in any specified range.) A new map can be created from an already existing one by taking its cell constants and size or from scratch. The map size, origin, and number of grid points per cell length and cell constants can be taken from an already existing map. The grid points are initialized to a specified value. The map origin and size can also be defined from an atom selection so that selected atoms plus some boundary grid points lie inside it. Mask points can be defined in several ways: - By setting all grid points with their values in specified range to the mask value. - By a distance criteria from a selection of atomic center positions - By conversion of grid points to real space points and then to atoms and further to mask grid points. - By converting unmasked grid points that lie between masked points to masked points. A map can be transformed (rotated, translated or copied) into the mask points of another map by linear interpolation. This is done so that the position of a mask point is transformed into the space of the map with density. The masked point density value is then obtained by interpolation from the eight surrounding grid points. Empty points of a map can be filled with values of the density points of another map by applying crystal symmetry operations. This routine is independent of crystal symmetry. In order to find the position of the grid point into which the density value should be copied rotation matrices and translation vectors are applied. 3.3. Applications The three applications most important for programming of electron density averaging routines are described below. For the first time a complete cyclic averaging procedure was applied when averaging electron density of rat cathepsin-B. The rat cathepsin-B data were used later during program development for testing the source code. Further program development was necessary because the monoclinic form of the riboflavin synthase unit cell with an initial 1.0A grid size consists of more than 7 million grid points and the "fast" procedure applying integer*1 (character*1) maps didn't allow any phase extension. When averaging carbamoylsarcosine hydrolase, routines for mask generation were significantly enhanced and auxiliary programs were improved to enable automatic phase extension. 3.3.1. Cathepsin-B The lysosomal cysteine proteinases play an important role in intracellular protein degradation (see Barrett et al., 1988). Of these proteinases, cathepsin-B is the most abundant and the most thoroughly studied. Besides its involvement in intracellular protein turnover, it has been implicated in tumor metastasis and in other disease states. cathepsin-B exhibits optimal activity in slightly acidic media and is irreversibly inactivated at alkaline pH-values. It acts as an endopeptidase with relatively broad specificity and a has slight preference for basic residues or phenylalanine at P2 (using the nomenclature of Schechter and Berger, 1967). Bulky side chains at P1 are disfavoured see Shaw et al., 1990). A remarkable feature of cathepsin B is its distinctive peptidyl dipeptidase activity (Aronson and Barrett, 1978; Bond and Barrett, 1980; Takahashi et al., 1986; Polgar and Csoma, 1987) at the carboxy terminus. cathepsin-B is inhibited by tipical cystein proteinase protein inhibitors such as cystatins and stefins (see Biol. Chem. Hoppe-seyler 371, Suppl). The complete amino acid sequences of rat (Takio et al., 1983), human (Ritonja et al., 1985) and bovine (Meloun et al., 1988) cathepsin-B and the partial sequence of the porcine (Takahashi et al., 1984) cathepsin-B have been communicated. According to the nucleotide sequences (Chan et al., 1986; Fong et al., 1986; Ferrara et al., 1990), cathepsin-B from human, rat or mouse is synthesized as a 339 amino acid residues polypeptide chain, which is processed to the mature single-chain molecule of 254 amino acid residues. In mammalian tissues, most of the active cathepsin-B is found as a two-chain molecule consisting of 47 (or 49) and 205 (or 204) residue polypeptide chains (light and heavy chain) covalently cross-linked by a disulfide bridge. The cathepsin-B sequence indicates a close structural homology with the plant proteinase papain (Takio et al., 1983). Comparisons of the sequences of cathepsins -L and -H with those of papain and actinidin resulted in alignment proposals for cathepsin-B (Takio et al., 1983; Kamphuis et al., 1985). Based on the 3-dimensional structures of papain (Kamphuis et al., 1984) and actinidin (Baker, 1980), the common structural features as well as sites of insertions and deletions were made more precise (Kamphuis et al., 1985; Baker & Drenth, 1987). cathepsin-B is considerably larger than papain or actinidin, and the acomodation of some of the longer polypeptide insertions and the arrangment of the active site residues remained unclear. A clear understanding of its specificity and of its catalytic properties requires the availability of an experimental structure as provided by X-ray crystallography. Human and rat liver cathepsin-B are the first crystallographically determined structures of lysosomal cysteine proteinases (Musil et al., 1991; Zucic et al., 1992). They are structurally related to cysteine proteinases of plant origin papain and actinidin. The monoclinic crystals of both proteins had P21 symmetry, though quite different cell constants (human cathepsin-B: a= 86.23A, b= 34.16A, c= 85.56A, $\beta$= 102.9o ; rat cathepsin-B a= 59.98A, b= 128.01A, c= 59.12A, $\beta$= 121.47o ). The human cathepsin-B crystallized with two molecules per asymmetric unit in a quasi-tetragonal form and the rat cathepsin-B with three molecules per asymmetric unit in an almost perfect hexagonal form (FIGURES 3.3.1.1 and 2). The structures were solved by a molecular replacement procedure, using a molecular model based on papain and actinidin structures. It merit to mention that approximate position of the second molecule in the human cathepsin-B crystals was deduced from heavy atom positions. First we tried to refine the structure of human cathepsin-B. The electron density inside the mask of each separate molecule (FIGURE 3.3.1.3) was averaged applying the improper symmetry operations as obtained by superposition of molecular models. The cyclic averaging procedure (including iterative fourier transformations of the density into structure factors and back) wasn't applied, since we assumed that averaging of two molecules does not suffice to improve the phases. Unfortunately the model could not be refined bellow an R-factor of 0.30. However, the current model of human cathepsin-B was applied to solve the structure of rat cathepsin-B. After successful rotational and translational search, the models were crystallographically refined and the residues adjusted to rat cathepsin-B sequence. The resulting electron density was averaged over all three molecules within a cyclic procedure. CHAR_LONG procedure was applied. The procedure is described in detail in APPENDIX C. In the resulting electron density map the loop 129 ... 140 could be immediately traced (FIGURE 3.3.1.4 and 5). Afterwards the electron density averaging procedure was applied as long as the molecular models during course of refinement didn't start to diverge from each other. 3.3.2. Riboflavin synthase Riboflavin synthase is enzyme active in final steps of riboflavin (vitamin B2) synthesis (review M\"uller et al., 1988). See FIGURE 3.3.2.1. Riboflavin is synthetised in microorganisms and plants. Heavy riboflavin synthases from \it Bacillus subtillis \rm is a complex of two enzymes quite different in their molecular weight. The complex consist of three $\alpha$-subunits and 60 $\beta$-subunits. Actually the $\alpha$ subunit, and not the $\beta$ subunit, is catalyzing the final step in riboflavin synthesis. Therefore the appropriate name for $\beta$ subunit should be lumazine synthase and not riboflavin synthase. The complete sequences of $\beta$-subunit (Ludwig et al., 1987) and $\alpha$-subunit (Schott et al., 1990a) have been communicated. The crystal structure of heavy riboflavin synthase (Ladenstein et al., 1988) has shown that the enzyme forms an icosahedral capsid consisting of 60 $\beta$ subunits. The investigated hexagonal crystals belonged to P6322 symmetry group with cell constants a=b= 156.4A, c= 298.5A and $\gamma$ =120o (Ladenstein et al., 1983) with 10 $\beta$-subunits in an asymmetric unit. That structure had an R-factor of 0.399 at 3.3A resolution. Later, lumazine synthase-riboflavin synthase complex was decomposed into subunits, and its icosahedral capsid, consisting of $\beta$ subunits only could be rebuilt and three crystal forms of "riboflavin synthase" were communicated (Schott et al., 1990b): A monoclinic modification belonged to space group C2 with cell constants of a=235.5A, b= 191.2A, and c= 165.4A and $\beta$= 134.5o and 30 molecules per asymmetric unit. The crystals diffracted to 2.8A resolution. A hexagonal form belonged to space group P6322 with cell constants of a=b=157.2, c= 300.8A and $\gamma$= 120o with 10 molecules per asymmetric unit similar to the heavy riboflavin hexagonal form. FIGURE 3.3.2.2 shows the spatial distribution of all 60 molecules. This new hexagonal form of $\beta$ subunit was refined further to an R-factor of about 0.32 (Ladenstein, unpublished results). This model was used in electron density averaging and refinement of the monoclinic form, for which data to 2.45A resolution were collected. The structure was subjected to rigid body refinement applying X-PLOR including reflection data to 3.5A resolution (Ritsert, unpublished results). At this stage I joined the project by adapting MAIN routines for electron density manipulation to handle electron density maps of large crystal cells. First only the array sizes have been changed and the CHAR_FAST procedure as described in APPENDIX C was applied. However, the R-factor of the averaged map did not converge. Therefore, the procedure was reexamined and programmed with many modifications. Finally, the REAL_LONG procedure enabled us to start with a successful electron density averaging procedure at 3.0A resolution gradually expanding the phases to 3.0A. Then the grid size was changed from 1.0A to 0.8A and phase extension continued until reflections to 2.45A resolution were included. The procedure was essentially the same as the cathepsin-B REAL_LONG procedure. The only significant difference introduced was that maps were added immediately after being transformed. They were not first stored on a disk and afterwards averaged in a separate procedure (AVER_MAPS.COM). During electron density averaging of "riboflavin synthase" proper local symmetry operations were applied. The model was adapted to the resulting averaged density, crystallographically refined and new electron density maps were calculated by further extending the phases via an electron density averaging procedure. This procedure was repeated several times until all reflections were included. The current R-factor of the model is 0.23 including data to 2.45A resolution. FIGURE 3.3.2.3 shows an averaged 2Fo-Fc electron density map. The complete description of the monoclinic form refinement will be presented by Ritsert et al.. 3.3.3. Carbamoylsarcosine hydrolase N-Carbamoylsarcosine amidohydrolase (CSHase, EC 3.5.1.59) catalyses the hydrolysis of N-carbamoylsarcosine to sarcosine with liberation of carbon dioxide and ammonia (see FIGURE 3.3.3.1). The enzyme has been found as part of a novel metabolic pathway for the degradation of creatinine to glycine via N-methylhydantoin, N-carbamoylsarcosine and sarcosine (Deeg et al., 1982, EP 0112571; Yamada et al., 1985; Kim et al., 1986, 1987; Shimizu et al., 1989; Siedel et al., 1988). It has been found in various microorganisms ( Shimizu et al., 1989) and was isolated and purified from \it Pseudomonas putida 77 \rm (Kim et al., 1986) and \it Arthrobacter sp. \rm (Siedel et al., 1988). This enzyme is highly specific for the degradation of N-carbamoylsarcosine to yield sarcosine. Here only a brief review of applied methods without structure description is presented in order to manifest usage of MAIN in this particular case. The complete work is described by Rom\~{a}o et al. (1992). 3.3.3.1. Introduction The crystals of carbamoylsarcosine hydrolase were obtained from the cloned gene. The crystals diffract beyond 3.0A resolution and belong to the monoclinic space group C2, with cell dimensions a= 136.22A, b= 122.29A, c= 70.87A, $\beta$= 91.82o. The self-rotation function of the Patterson map was used to search for local two-fold axes, employing PROTEIN search routines. The peak at $\psi$=0o, $\phi$=0o corresponds to the crystallographic b axis. The other large peaks indicate local diads relating the subunits within the tetramer. Peaks show up for polar angles ($\kappa$=180o ) $\psi$=90o, $\phi$=82o; $\psi$=90o, $\phi$=172o; $\psi$=45o, $\phi$=172o; $\psi$=46o, $\phi$=352o, with correlation values of 0.583, 0.575, 0.521 and 0.503 respectively, relative to the origin peak (see FIGURE 3.3.3.2). From crystal density measurements and auto-correlation of the native Patterson map, it became evident that there are four molecules of carbamoylsarcosine hydrolase per asymmetric unit. Since there was no molecular model of a related enzyme available, heavy atom derivatives had to be prepared. Combination of two uranium-, rhodium- and mercury- and osmium-derivative phase sets served in the calculation of the first 3.0A resolution map. Phases were weighted by the figure of merit. The obtained map was noisy and no secondary structural elements or molecular boundaries could be recognized. The m.i.r. phases were then modified by solvent-flattening at 3.0A resolution. The density in the solvent regions was set to zero (Wang, 1985) using programs of M.Schneider. The unit cell was sampled at 130x120x70 grid points. The radius of the averaging sphere was 9A and the solvent level was adjusted to 0.51. The modified electron density was Fourier transformed, and the resulting phases were combined with m.i.r. phases by applying the phase combination procedure from Hendrickson and Lattman (1970). Seven cycles of such calculations were performed until convergence (R=0.233). The quality of the solvent-flattened density map allowed us to define the boundaries of the tetramer in one asymmetric unit. However, polypeptide chains could still not be identified. 3.3.3.2. Determination of the local symmetry of the tetramer The presence of four crystallographically independent subunits in one asymmetric unit allows averaging of the electron density and improvement in the quality of the final map. In order to perform this calculation, we needed to know the exact orientation and position of the local symmetry axes. Their orientations were obtained from the self-rotation function, although the intramolecular and crystal symmetry-generated axes were still ambiguos. The correct position of the rotational axes was found from the electron density as follows: In the first step, a molecular envelope of an asymmetric unit was defined in the solvent-flattened density, using the program X-CONTOUR (Buchberger,1991). The density inside the selected envelope was placed in an oversize P1 cell (204x183x105A) in order to avoid intermolecular contacts. This cell was Fourier-transformed, and, using the newly calculated structure factors, a Patterson synthesis (now of a single asymmetric unit) was performed. As before, a self-rotation function of the Patterson map was calculated. The obtained solutions were consistent with the previously determined orientation of the non-crystallographic axes. The presence of a peak corresponding to the crystallographic diad b indicated that the selected envelope still included crystallographically equivalent parts of another asymmetric unit. To position correctly the local symmetry axes in the asymmetric unit, a translation function for each of the four possible local axes was calculated using real-space routines of PROTEIN. The peaks of electron density selected inside the mask were rotated about each of the local axes and translated in small increments with respect to the unchanged m.i.r. density. The calculated correlation function indicated maxima for the best positioning of the three axes inside the asymmetric unit. This calculation showed the three genuine intramolecular local axes, while for the fourth axis, defined by the polar angles $\psi$=90o, $phi$=172o, $\kappa$=180o, no maximum was found; this axis is generated by the crystallographic b axis and the local diad at $\psi$=90o, $\phi$=82o, $\kappa$=180o (see FIGURE 3.3.3.2). The orientations and positions of the three local diads were refined with the final results indicating three mutually perpendicular two-fold axes of symmetry : axis number (1), is 6o away from the c-axis of the crystal, while the two other two axes (2) and (3) make angles of 45o with the crystal b-axis. Since the center of rotation about each of the local symmetry axes was in the lower half of the masked area, and since molecular boundaries were not clearly recognized in regions where crystallographically equivalent molecules came into contact, the molecular envelope had to be improved. The solvent-flattened density, placed inside the current envelope, was put in the large P1 cell. This density was averaged by applying the local symmetry operations, expecting that density areas not belonging to the same asymmetric unit should smear out. With the transformed averaged density, a second, more clearly defined, envelope was produced. The local symmetry operations were again determined as described above. The self-rotation function of the Patterson map, calculated as before, confirmed the three local axes as major peaks, now sharper as in the case of the first envelope. The following rotational and translational search gave a more correct orientation and position for each of the non-crystallographic axes. Averaging of the electron density inside the chosen asymmetric unit was now possible. 3.3.3.3. Initial averaging with ideal 222 symmetry With the new envelope and new symmetry operations, the first averaged map was calculated. The solvent-flattened electron density was averaged inside the mask by applying ideal 222 symmetry. Afterwards, the whole unit cell was generated and its density Fourier-transformed. The Fourier transformations were carried out using programs in PROTEIN. The procedure was repeated until convergence of the electron density R-value, which dropped from 0.44 to 0.29 after 7 cycles of averaging. When comparing the first averaged map with the one resulting after 7 cycles of the averaging procedure, it was obvious that cyclic averaging did not improve the map, suggesting that the asymmetric unit does not fulfill ideal 222 symmetry. The first averaged map, however, was markedly improved in comparison to the original m.i.r. map, but still only a few secondary structural elements (two $\alpha$-helical segments and some $\beta$-strands) could be recognized. Since model building could not proceed, the map has to be further improved. 3.3.3.4. Proper-improper symmetry averaging Using MAIN, a simplified representation of the solvent flattened density corresponding to one selected asymmetric unit was displayed on a PS300 Evans & Sutherland graphic system. We observed that the original masked region could be split into two separate (upper and lower) parts (see FIGURE 3.3.3.3), suggesting the possibility to use improper averaging. For each of the individual regions, a new envelope was defined using MAIN. A self-rotation function was calculated for the density inside each envelope, confirming axis 3 (see FIGURE 3.3.3.2). Axes 1 and 2, therefore, have to be located in the plane separating both identified halves of the asymmetric unit (see FIGURE 3.3.3.3 ). The positions of the local axes were then optimized by an interactive translational search procedure followed by combined rotational and translational gradient optimization using MAIN. First, the position and orientation of an ideal two-fold axis (axis 3) for the upper and lower halves was optimized. The correlations of the maxima were 0.167 and 0.176 for the upper and lower half respectively; the autocorrelation values were 1.0. The density inside each half was then averaged by applying the obtained parameters for proper two-fold averaging. Afterwards, both halves with averaged densities were superimposed by rotations about axes 1 and 2 and the correlations maximized. In these calculations, ideal two-fold symmetry was no longer maintained. Four additional transformations were obtained - two for the superposition of the averaged upper half density to the averaged lower half rotated about axes 1 and 2, and two for the reverse transformations. The maximal correlations obtained were in all four cases higher than 0.3. These parameters were then applied in a cyclic averaging procedure combining proper and improper averaging. Upper and lower halves were first averaged by applying proper symmetry (axis 3), and the averaged halves were then averaged with improper symmetry relations (axes 1 and 2). Improper averaging was done by transforming the averaged density of the lower half about both axes 1 and 2 to the upper half region. The averaged upper half and both transformed lower half density maps were then added together and averaged. The analogous procedure was applied for the lower half. The density of the complete crystal cell was generated by applying crystal symmetry operations to each averaged half separately. The resulting cell was Fourier-transformed. This procedure was repeated in cycles and converged after 8 cycles of proper-improper symmetry averaging (R factor = 0.43-0.25, 20-3A). The resulting electron density map was markedly improved in comparison to the map obtained after ideal 222 symmetry averaging. Many segments of the main chain could now be traced and were built as a polyalanine chain (using the program system FRODO (Jones, 1978) on a PS300 Evans & Sutherland graphic system). About 60% of the total number of residues were built in as unconnected segments and the tetramer was generated using the previous local symmetry operations. 3.3.3.5. Improper symmetry averaging With these partial models of the four subunits, masks for each of them could be defined using MAIN. Intermolecular contacts were taken into account in the program in order to avoid overlap of the masks. Optimization of rotational and translational parameters between the four density areas was repeated. At this stage, positioning of each model was first optimized in the electron density. Molecules A and D remained in their positions, while molecules B and C were slightly moved. All four molecular models were then superimposed by minimizing the r.m.s distances between equivalent atoms. 12 new rotational matrices and translational vectors were thus determined. These new local symmetry operations enabled us to apply the non-ideal, improper symmetry averaging to the tetrameric asymmetric unit. In the first stages of improper symmetry averaging, averaging was performed still without including phases from the partial model but the masks were expanded according to the `growth' of the molecular model. The solvent-flattened density was four times averaged, independently for each subunit of the tetramer. The cell was reconstructed from the four averaged densities and Fourier back-transformed. The new density was again averaged and the procedure was repeated. It converged after 12 cycles of averaging, with R=0.42-0.24. With these new density maps, the model could be further improved, and its calculated phases were then combined with the original m.i.r. phases ( Hendrickson & Lattman procedure). Parts of the model which were considered questionable were omitted from the phase calculation. Model phases were weighted using Sim (1959) formulas. The whole averaging procedure was then carried out as follows: The model was built in one of the subunits of the tetramer. The other three subunits were generated by applying the local- symmetry operations. The whole tetramer was refined with the program X-PLOR (Br\"unger et al. 1989). With the refined model, four new masks were re-determined and new local symmetry parameters re-calculated. M.i.r. phases were combined with phases from the refined model and an electron density map was calculated. Cyclic averaging was then applied to this initial map using the new masks and symmetry operations. After convergence the whole model was re-evaluated and refit to the electron density on the graphics system. Fitting was checked also against the m.i.r. map. After several rounds of model building, crystallographic refinement, phase combination and electron density averaging, the electron density slowly improved. FIGURES 3.3.3.4 shows the improvement of the electron density maps. The final R-factor for 56641 reflections between 10.0 and 2.0A resolution and 8304 atoms of the tetramer is 0.186. The procedure, stressing data manipulations done with MAIN, is described in detail in APPENDIX D.