3.3.1. Cathepsin-B The lysosomal cysteine proteinases play an important role in intracellular protein degradation (see Barrett et al., 1988). Of these proteinases, cathepsin-B is the most abundant and the most thoroughly studied. Besides its involvement in intracellular protein turnover, it has been implicated in tumor metastasis and in other disease states. cathepsin-B exhibits optimal activity in slightly acidic media and is irreversibly inactivated at alkaline pH-values. It acts as an endopeptidase with relatively broad specificity and a has slight preference for basic residues or phenylalanine at P2 (using the nomenclature of Schechter and Berger, 1967). Bulky side chains at P1 are disfavoured see Shaw et al., 1990). A remarkable feature of cathepsin B is its distinctive peptidyl dipeptidase activity (Aronson and Barrett, 1978; Bond and Barrett, 1980; Takahashi et al., 1986; Polgar and Csoma, 1987) at the carboxy terminus. cathepsin-B is inhibited by tipical cystein proteinase protein inhibitors such as cystatins and stefins (see Biol. Chem. Hoppe-seyler 371, Suppl). The complete amino acid sequences of rat (Takio et al., 1983), human (Ritonja et al., 1985) and bovine (Meloun et al., 1988) cathepsin-B and the partial sequence of the porcine (Takahashi et al., 1984) cathepsin-B have been communicated. According to the nucleotide sequences (Chan et al., 1986; Fong et al., 1986; Ferrara et al., 1990), cathepsin-B from human, rat or mouse is synthesized as a 339 amino acid residues polypeptide chain, which is processed to the mature single-chain molecule of 254 amino acid residues. In mammalian tissues, most of the active cathepsin-B is found as a two-chain molecule consisting of 47 (or 49) and 205 (or 204) residue polypeptide chains (light and heavy chain) covalently cross-linked by a disulfide bridge. The cathepsin-B sequence indicates a close structural homology with the plant proteinase papain (Takio et al., 1983). Comparisons of the sequences of cathepsins -L and -H with those of papain and actinidin resulted in alignment proposals for cathepsin-B (Takio et al., 1983; Kamphuis et al., 1985). Based on the 3-dimensional structures of papain (Kamphuis et al., 1984) and actinidin (Baker, 1980), the common structural features as well as sites of insertions and deletions were made more precise (Kamphuis et al., 1985; Baker & Drenth, 1987). cathepsin-B is considerably larger than papain or actinidin, and the acomodation of some of the longer polypeptide insertions and the arrangment of the active site residues remained unclear. A clear understanding of its specificity and of its catalytic properties requires the availability of an experimental structure as provided by X-ray crystallography. Human and rat liver cathepsin-B are the first crystallographically determined structures of lysosomal cysteine proteinases (Musil et al., 1991; Zucic et al., 1992). They are structurally related to cysteine proteinases of plant origin papain and actinidin. The monoclinic crystals of both proteins had P21 symmetry, though quite different cell constants (human cathepsin-B: a= 86.23A, b= 34.16A, c= 85.56A, $\beta$= 102.9o ; rat cathepsin-B a= 59.98A, b= 128.01A, c= 59.12A, $\beta$= 121.47o ). The human cathepsin-B crystallized with two molecules per asymmetric unit in a quasi-tetragonal form and the rat cathepsin-B with three molecules per asymmetric unit in an almost perfect hexagonal form (FIGURES 3.3.1.1 and 2). The structures were solved by a molecular replacement procedure, using a molecular model based on papain and actinidin structures. It merit to mention that approximate position of the second molecule in the human cathepsin-B crystals was deduced from heavy atom positions. First we tried to refine the structure of human cathepsin-B. The electron density inside the mask of each separate molecule (FIGURE 3.3.1.3) was averaged applying the improper symmetry operations as obtained by superposition of molecular models. The cyclic averaging procedure (including iterative fourier transformations of the density into structure factors and back) wasn't applied, since we assumed that averaging of two molecules does not suffice to improve the phases. Unfortunately the model could not be refined bellow an R-factor of 0.30. However, the current model of human cathepsin-B was applied to solve the structure of rat cathepsin-B. After successful rotational and translational search, the models were crystallographically refined and the residues adjusted to rat cathepsin-B sequence. The resulting electron density was averaged over all three molecules within a cyclic procedure. CHAR_LONG procedure was applied. The procedure is described in detail in APPENDIX C. In the resulting electron density map the loop 129 ... 140 could be immediately traced (FIGURE 3.3.1.4 and 5). Afterwards the electron density averaging procedure was applied as long as the molecular models during course of refinement didn't start to diverge from each other. 3.3.2. Riboflavin synthase Riboflavin synthase is enzyme active in final steps of riboflavin (vitamin B2) synthesis (review M\"uller et al., 1988). See FIGURE 3.3.2.1. Riboflavin is synthetised in microorganisms and plants. Heavy riboflavin synthases from \it Bacillus subtillis \rm is a complex of two enzymes quite different in their molecular weight. The complex consist of three $\alpha$-subunits and 60 $\beta$-subunits. Actually the $\alpha$ subunit, and not the $\beta$ subunit, is catalyzing the final step in riboflavin synthesis. Therefore the appropriate name for $\beta$ subunit should be lumazine synthase and not riboflavin synthase. The complete sequences of $\beta$-subunit (Ludwig et al., 1987) and $\alpha$-subunit (Schott et al., 1990a) have been communicated. The crystal structure of heavy riboflavin synthase (Ladenstein et al., 1988) has shown that the enzyme forms an icosahedral capsid consisting of 60 $\beta$ subunits. The investigated hexagonal crystals belonged to P6322 symmetry group with cell constants a=b= 156.4A, c= 298.5A and $\gamma$ =120o (Ladenstein et al., 1983) with 10 $\beta$-subunits in an asymmetric unit. That structure had an R-factor of 0.399 at 3.3A resolution. Later, lumazine synthase-riboflavin synthase complex was decomposed into subunits, and its icosahedral capsid, consisting of $\beta$ subunits only could be rebuilt and three crystal forms of "riboflavin synthase" were communicated (Schott et al., 1990b): A monoclinic modification belonged to space group C2 with cell constants of a=235.5A, b= 191.2A, and c= 165.4A and $\beta$= 134.5o and 30 molecules per asymmetric unit. The crystals diffracted to 2.8A resolution. A hexagonal form belonged to space group P6322 with cell constants of a=b=157.2, c= 300.8A and $\gamma$= 120o with 10 molecules per asymmetric unit similar to the heavy riboflavin hexagonal form. FIGURE 3.3.2.2 shows the spatial distribution of all 60 molecules. This new hexagonal form of $\beta$ subunit was refined further to an R-factor of about 0.32 (Ladenstein, unpublished results). This model was used in electron density averaging and refinement of the monoclinic form, for which data to 2.45A resolution were collected. The structure was subjected to rigid body refinement applying X-PLOR including reflection data to 3.5A resolution (Ritsert, unpublished results). At this stage I joined the project by adapting MAIN routines for electron density manipulation to handle electron density maps of large crystal cells. First only the array sizes have been changed and the CHAR_FAST procedure as described in APPENDIX C was applied. However, the R-factor of the averaged map did not converge. Therefore, the procedure was reexamined and programmed with many modifications. Finally, the REAL_LONG procedure enabled us to start with a successful electron density averaging procedure at 3.0A resolution gradually expanding the phases to 3.0A. Then the grid size was changed from 1.0A to 0.8A and phase extension continued until reflections to 2.45A resolution were included. The procedure was essentially the same as the cathepsin-B REAL_LONG procedure. The only significant difference introduced was that maps were added immediately after being transformed. They were not first stored on a disk and afterwards averaged in a separate procedure (AVER_MAPS.COM). During electron density averaging of "riboflavin synthase" proper local symmetry operations were applied. The model was adapted to the resulting averaged density, crystallographically refined and new electron density maps were calculated by further extending the phases via an electron density averaging procedure. This procedure was repeated several times until all reflections were included. The current R-factor of the model is 0.23 including data to 2.45A resolution. FIGURE 3.3.2.3 shows an averaged 2Fo-Fc electron density map. The complete description of the monoclinic form refinement will be presented by Ritsert et al.. 3.3.3. Carbamoylsarcosine hydrolase Here only a brief review of applied methods without structure description is presented in order to manifest usage of MAIN in this particular case. The complete work is described by Rom\~{a}o et al. (1992). 3.3.3.1. Introduction The crystals of carbamoylsarcosine hydrolase were obtained from the cloned gene. The crystals diffract beyond 3.0A resolution and belong to the monoclinic space group C2, with cell dimensions a= 136.22A, b= 122.29A, c= 70.87A, $\beta$= 91.82o. The self-rotation function of the Patterson map was used to search for local two-fold axes, employing PROTEIN search routines. The peak at $\psi$=0o, $\phi$=0o corresponds to the crystallographic b axis. The other large peaks indicate local diads relating the subunits within the tetramer. Peaks show up for polar angles ($\kappa$=180o ) $\psi$=90o, $\phi$=82o; $\psi$=90o, $\phi$=172o; $\psi$=45o, $\phi$=172o; $\psi$=46o, $\phi$=352o, with correlation values of 0.583, 0.575, 0.521 and 0.503 respectively, relative to the origin peak (see FIGURE 3.3.3.2). From crystal density measurements and auto-correlation of the native Patterson map, it became evident that there are four molecules of carbamoylsarcosine hydrolase per asymmetric unit. Since there was no molecular model of a related enzyme available, heavy atom derivatives had to be prepared. Combination of two uranium-, rhodium- and mercury- and osmium-derivative phase sets served in the calculation of the first 3.0A resolution map. Phases were weighted by the figure of merit. The obtained map was noisy and no secondary structural elements or molecular boundaries could be recognized. The m.i.r. phases were then modified by solvent-flattening at 3.0A resolution. The density in the solvent regions was set to zero (Wang, 1985) using programs of M.Schneider. The unit cell was sampled at 130x120x70 grid points. The radius of the averaging sphere was 9A and the solvent level was adjusted to 0.51. The modified electron density was Fourier transformed, and the resulting phases were combined with m.i.r. phases by applying the phase combination procedure from Hendrickson and Lattman (1970). Seven cycles of such calculations were performed until convergence (R=0.233). The quality of the solvent-flattened density map allowed us to define the boundaries of the tetramer in one asymmetric unit. However, polypeptide chains could still not be identified. 3.3.3.2. Determination of the local symmetry of the tetramer The presence of four crystallographically independent subunits in one asymmetric unit allows averaging of the electron density and improvement in the quality of the final map. In order to perform this calculation, we needed to know the exact orientation and position of the local symmetry axes. Their orientations were obtained from the self-rotation function, although the intramolecular and crystal symmetry-generated axes were still ambiguos. The correct position of the rotational axes was found from the electron density as follows: In the first step, a molecular envelope of an asymmetric unit was defined in the solvent-flattened density, using the program X-CONTOUR (Buchberger,1991). The density inside the selected envelope was placed in an oversize P1 cell (204x183x105A) in order to avoid intermolecular contacts. This cell was Fourier-transformed, and, using the newly calculated structure factors, a Patterson synthesis (now of a single asymmetric unit) was performed. As before, a self-rotation function of the Patterson map was calculated. The obtained solutions were consistent with the previously determined orientation of the non-crystallographic axes. The presence of a peak corresponding to the crystallographic diad b indicated that the selected envelope still included crystallographically equivalent parts of another asymmetric unit. To position correctly the local symmetry axes in the asymmetric unit, a translation function for each of the four possible local axes was calculated using real-space routines of PROTEIN. The peaks of electron density selected inside the mask were rotated about each of the local axes and translated in small increments with respect to the unchanged m.i.r. density. The calculated correlation function indicated maxima for the best positioning of the three axes inside the asymmetric unit. This calculation showed the three genuine intramolecular local axes, while for the fourth axis, defined by the polar angles $\psi$=90o, $phi$=172o, $\kappa$=180o, no maximum was found; this axis is generated by the crystallographic b axis and the local diad at $\psi$=90o, $\phi$=82o, $\kappa$=180o (see FIGURE 3.3.3.2). The orientations and positions of the three local diads were refined with the final results indicating three mutually perpendicular two-fold axes of symmetry : axis number (1), is 6o away from the c-axis of the crystal, while the two other two axes (2) and (3) make angles of 45o with the crystal b-axis. Since the center of rotation about each of the local symmetry axes was in the lower half of the masked area, and since molecular boundaries were not clearly recognized in regions where crystallographically equivalent molecules came into contact, the molecular envelope had to be improved. The solvent-flattened density, placed inside the current envelope, was put in the large P1 cell. This density was averaged by applying the local symmetry operations, expecting that density areas not belonging to the same asymmetric unit should smear out. With the transformed averaged density, a second, more clearly defined, envelope was produced. The local symmetry operations were again determined as described above. The self-rotation function of the Patterson map, calculated as before, confirmed the three local axes as major peaks, now sharper as in the case of the first envelope. The following rotational and translational search gave a more correct orientation and position for each of the non-crystallographic axes. Averaging of the electron density inside the chosen asymmetric unit was now possible. 3.3.3.3. Initial averaging with ideal 222 symmetry With the new envelope and new symmetry operations, the first averaged map was calculated. The solvent-flattened electron density was averaged inside the mask by applying ideal 222 symmetry. Afterwards, the whole unit cell was generated and its density Fourier-transformed. The Fourier transformations were carried out using programs in PROTEIN. The procedure was repeated until convergence of the electron density R-value, which dropped from 0.44 to 0.29 after 7 cycles of averaging. When comparing the first averaged map with the one resulting after 7 cycles of the averaging procedure, it was obvious that cyclic averaging did not improve the map, suggesting that the asymmetric unit does not fulfill ideal 222 symmetry. The first averaged map, however, was markedly improved in comparison to the original m.i.r. map, but still only a few secondary structural elements (two $\alpha$-helical segments and some $\beta$-strands) could be recognized. Since model building could not proceed, the map has to be further improved. 3.3.3.4. Proper-improper symmetry averaging Using MAIN, a simplified representation of the solvent flattened density corresponding to one selected asymmetric unit was displayed on a PS300 Evans & Sutherland graphic system. We observed that the original masked region could be split into two separate (upper and lower) parts (see FIGURE 3.3.3.3), suggesting the possibility to use improper averaging. For each of the individual regions, a new envelope was defined using MAIN. A self-rotation function was calculated for the density inside each envelope, confirming axis 3 (see FIGURE 3.3.3.2). Axes 1 and 2, therefore, have to be located in the plane separating both identified halves of the asymmetric unit (see FIGURE 3.3.3.3 ). The positions of the local axes were then optimized by an interactive translational search procedure followed by combined rotational and translational gradient optimization using MAIN. First, the position and orientation of an ideal two-fold axis (axis 3) for the upper and lower halves was optimized. The correlations of the maxima were 0.167 and 0.176 for the upper and lower half respectively; the autocorrelation values were 1.0. The density inside each half was then averaged by applying the obtained parameters for proper two-fold averaging. Afterwards, both halves with averaged densities were superimposed by rotations about axes 1 and 2 and the correlations maximized. In these calculations, ideal two-fold symmetry was no longer maintained. Four additional transformations were obtained - two for the superposition of the averaged upper half density to the averaged lower half rotated about axes 1 and 2, and two for the reverse transformations. The maximal correlations obtained were in all four cases higher than 0.3. These parameters were then applied in a cyclic averaging procedure combining proper and improper averaging. Upper and lower halves were first averaged by applying proper symmetry (axis 3), and the averaged halves were then averaged with improper symmetry relations (axes 1 and 2). Improper averaging was done by transforming the averaged density of the lower half about both axes 1 and 2 to the upper half region. The averaged upper half and both transformed lower half density maps were then added together and averaged. The analogous procedure was applied for the lower half. The density of the complete crystal cell was generated by applying crystal symmetry operations to each averaged half separately. The resulting cell was Fourier-transformed. This procedure was repeated in cycles and converged after 8 cycles of proper-improper symmetry averaging (R factor = 0.43-0.25, 20-3A). The resulting electron density map was markedly improved in comparison to the map obtained after ideal 222 symmetry averaging. Many segments of the main chain could now be traced and were built as a polyalanine chain (using the program system FRODO (Jones, 1978) on a PS300 Evans & Sutherland graphic system). About 60% of the total number of residues were built in as unconnected segments and the tetramer was generated using the previous local symmetry operations. 3.3.3.5. Improper symmetry averaging With these partial models of the four subunits, masks for each of them could be defined using MAIN. Intermolecular contacts were taken into account in the program in order to avoid overlap of the masks. Optimization of rotational and translational parameters between the four density areas was repeated. At this stage, positioning of each model was first optimized in the electron density. Molecules A and D remained in their positions, while molecules B and C were slightly moved. All four molecular models were then superimposed by minimizing the r.m.s distances between equivalent atoms. 12 new rotational matrices and translational vectors were thus determined. These new local symmetry operations enabled us to apply the non-ideal, improper symmetry averaging to the tetrameric asymmetric unit. In the first stages of improper symmetry averaging, averaging was performed still without including phases from the partial model but the masks were expanded according to the `growth' of the molecular model. The solvent-flattened density was four times averaged, independently for each subunit of the tetramer. The cell was reconstructed from the four averaged densities and Fourier back-transformed. The new density was again averaged and the procedure was repeated. It converged after 12 cycles of averaging, with R=0.42-0.24. With these new density maps, the model could be further improved, and its calculated phases were then combined with the original m.i.r. phases ( Hendrickson & Lattman procedure). Parts of the model which were considered questionable were omitted from the phase calculation. Model phases were weighted using Sim (1959) formulas. The whole averaging procedure was then carried out as follows: The model was built in one of the subunits of the tetramer. The other three subunits were generated by applying the local- symmetry operations. The whole tetramer was refined with the program X-PLOR (Br\"unger et al. 1989). With the refined model, four new masks were re-determined and new local symmetry parameters re-calculated. M.i.r. phases were combined with phases from the refined model and an electron density map was calculated. Cyclic averaging was then applied to this initial map using the new masks and symmetry operations. After convergence the whole model was re-evaluated and refit to the electron density on the graphics system. Fitting was checked also against the m.i.r. map. After several rounds of model building, crystallographic refinement, phase combination and electron density averaging, the electron density slowly improved. FIGURES 3.3.3.4 shows the improvement of the electron density maps. The final R-factor for 56641 reflections between 10.0 and 2.0A resolution and 8304 atoms of the tetramer is 0.186. The procedure, stressing data manipulations done with MAIN, is described in detail in APPENDIX D. 3.4. Maps Map is a 3-dimensional array of grid points, where each of them has a a value. According to the value, they are treated as empty, density or mask points. The grid points with their values inside the density interval are the density points, the ones with their values below the density interval are empty points, and the ones above are mask points. Each map has size, its starting coordinates and its cell constants. Grid points laying in different unit cells with the same fractional coordinates are identical. That means, when a whole unit cell is defined in a map, the program can expand the density through the whole space. There are 6 elementary operations that can be done with map data: - Creation of a map, - Rotations and translations of a map, - Building the crystall unit cell from an assymetric unit or its parts, - Seting values to selected grid points, - Scaling a single map with a constant and adding two maps. The smallest map conntains only a header where cell constants and crystal symmetry operations and number of grids along the cell axes are stored. Program can read PROTEIN and its own native formats of electron density maps. The maps can be written in the Lyn Ten Eyck and MAIN native formats. The native formats are ASCI files with record length 80 so that they can be edited and changed with editor. Besides a variety of smaller conversion programs was written that allow conversion of maps between PROTEIN, X-PLOR, P1SF, FRODO and native MAIN formats. Operations on a map grid points can be applied when they are empty or masked. (The exception is the SET command that can set a value to a grid point in any specified range.) A new map can be created from an already existing one by taking its cell constants and size or from scratch. Map's size, origin, and number of grid points per cell length, cell constants can be taken from an already existing map or modified. The grid points are initialized to a specified value. Map's origin and size can be defined from an atom selection so that the selection plus some boundry grid points lie inside it. A map can be transformed (rotated, translated or copied) into the mask points of another map by linear interpolation. This is done so that the position of a mask point is transformed into the space of the density map. Its value is then obtained by interpolation from the eight surrounding grid points. Empty points of a map can be filled with values of the density points of another map by applying crystal symmetry operations. This routine is independent on crystal symmetry group. In order to find the position of the grid point into which the density value should be copied, it doesn't apply rules, but the rotation matrices and translation vectors. Mask points can be defined in several ways: - By setting all grid points with their values in specified range to the mask value. - By a distance criteria from a selection of atomic center positions - By conversion grid points to real space points and then to atoms and further to mask grid points. Real space points can be created from a density by - defining a range of grid points values in a box - searching for density piks via - a square function - sum of density of surrounding grid points points (solvent flattening algorithm) - weight center along the a axes