9. Topology and parameter files Keyword: library, topology, residue, compatibility, standard In this chapter you can learn how to use available topology and force field parameter files and how to create your own ones. The files are stored in the "MAIN:top_par/" directory. Topology files are based on residue(s) entries, which are composed off records based on atom names. Parameter files include force field terms, which are based on atom type (CLASS) entries. Only use of MAIN topology files enables you to build molecular model from scratch and (or) modify atomic composition of your molecular model and later perform ENERGY calculations. With the MAIN 97 release it became possible to geometrically optimize any molecular model. The missing parameters for unknown residues and atoms are guessed from the current model. For further details you are reffered to the section "Modifying and creating new topology library entries" at the end of the chapter. X-PLOR topology and parameters can be used only for ENERGY calculations. You can not use them to build molecular models from scratch or modify their topology. You can however any time convert X-PLOR files into MAIN files. See the section "Converting XPLOR topology files to MAIN form". The section "Reading MAIN topology and parameter files" brings you how to read (load) the already available libraries. The section "Inspecting and saving topology library records - write" explains how to save whole or partial entries to a file and what do the records mean. The section "Inspecting topology library records - show" explains how to use the SHOW command to inspect the contens of the loaded entries. The section "Inspecting and saving force field parameters - write and show" describes how to inspect and save the parameter records and which information is included in these records. The section "Using available MAIN topology files" explains how to extract and use the necessary information for model building and energy calculations. The section "Frequent topology and parameter assignment problems" brings to you how to adjust your residue and atom name descriptions to a loaded topology library. The section "Using available X-PLOR topology files" explains you how to use X-PLOR topology files. The section "Converting XPLOR topology files to MAIN form" explains how to create MAIN topology files out of X-PLOR files so that they are applicable for model building. The section "Modifying and creating new topology library entries" explains how to create new topology library entries (residues) by adapting already existing residues or generate a completely new residues. 9.1. Reading MAIN topology and parameter files The simplest way is to call a "get_top_par....com" command macro (with "@" or "<") which reads appropriate topology and parameter files and specifies the macro name (the character variable DEF_ALL) invoked by clicking DEFINE or DEFINE_A to redefine atoms types (CLASSES), atomic CHARGES, BOND, ANGLE, DIHEDRAL and IMPROPER angle lists. If everything is fine with your case and you do not want to know how topology assignemnt works, you can stop here. The "MAIN_UTILS:get_top_par_19_csd.com" macro reads the Engh & Huber topology and parameter files for amino acid residues including necessary C- and N- termini definitions plus a solvent (H2O) molecule definition, and defines the value of the variable DEF_ALL. > read file ">top/top_19_csd.main" top main > read file ">top/par_19_csd.main" par init > set vari DEF_ALL = ">utils/def_top_par_19.com" > return 9.2. Inspecting and saving topology library records - write Now you can see what the program has read in by typing > write topology WRITE TOPOLOGY writes the whole topology library. You can limit the written information to a residue by specifying it: > write topology residue ALA The output now is shorter, it lists all known CLASSes and then only the information related to the alanine residue topology. Each residue record starts with the word "residue" followed by the residue abbreviation code "ALA" (up to 4 characters long). Currently each residue can consist only of a single "group". residue ALA group atom N clas=NH1 charge=-.350 coor .000 .000 .000 atom CA clas=CH1E charge=.100 coor .539 -1.353 -.060 atom H clas=H charge=.250 coor .617 .761 .021 atom C clas=C charge=.550 coor 2.063 -1.335 -.081 atom O clas=O charge=-.550 coor 2.682 -.272 -.051 atom CB clas=CH3E charge=.000 coor .003 -2.077 -1.287 The "atom" records include ATOM NAME (up to 4 characters are read). Each atom name must be unique within a residue in order to allow unique atom differentiation. Number of atom records per residue is not limited, however, the total sum of all atoms present in the topology library is limited. The "clas" identifier is atom type assigned to the atom. It is used to correctly assign forcefield parameters to its bonding and nonbonding terms. Atomic partial "charge" can be any real number (take care when inventing your own.) The "coor-dinates" data are used only when MAIN starts to build a new segment. These numbers are actual coordinates of atoms at the time of creation of this topology library, specified relatively to the first atom in the residue. bond N -C bond CA N bond H N bond C CA bond O C bond CB CA The "bond" records list all the covalent bonds within the residue. MAIN topology file in contrary to the X-PLOR convention allows assignment of bonds also to the atoms of a previous residue based on a convetion that the '-' sign relates to the previous residue. The bond records are usually used only for model building, while the energy bond lists are created from the connectivity table automatically. dihe -C N CA C dihe -N -CA -C N dihe -CA -C N CA impr CA N C CB impr -C -CA N -O impr N -C CA H The "dihedral" and "improper" angle records list DIHEDRAL and IMPROPER angle definitions needed for energy calculations. Again, as by bonds, the '-' sign means that the related atoms are to be found in a previous residue. So it is possible to avoid use of XPLOR "patches" for the peptide chain when generating all bonding lists. inte N -C 1.329 -CA 116.214 -O -180.000 inte CA N 1.458 -C 121.609 -CA 180.000 inte H N .980 -C 119.139 -CA .000 inte C CA 1.525 N 111.063 -C 180.000 inte O C 1.231 CA 120.839 N -.059 inte CB CA 1.522 N 110.147 -C 57.525 The "internal" coordinate records include distance of the record atom to its neighbor, angle to the neighbor of the neighbor and an dihedral angle to one more atom behind. These records are used to attached residues to a previously defined residue. end Each residue definition ends with a record consisting of "end". 9.3. Inspecting topology library records - show A different way of inspecting a topology library is by using SHOW TOPO commands. For the SHOW commands the rule, what can be written can be also READ, does not apply. > show topo resi * The SHOW TOPO alone gives you the number of entries in the topology library, togethre with RESIDUE * lists all short names of available residues. The '*' means any residue name will match, 'H*' means any residue starting with 'H' will match and 'ALA' mean only ALA residue name matches. Complete information about a topology library residue can be obtained with the following command > show topology resi ALA atom * all bond * dihe * impr * inte * where the '*' charcter stands for any atom. The selection of matching atoms can be reduced by supplying characters from atom names. Any combination of items (BOND, DIHEDRAl, IMPROPER, ATOM, INTERNAL) is acceptable. When only instepcting a topology residue description you may prefer the SHOW TOPOLOGY commands, since outpu is formated. The following output is result of the above command: SHOW> TOTAL NUMBER OF TOPOLOGY RESIDUES KEPT 28 1 RESIDUE ALA ATOMS: 6 KIND: r ATOMS N NH1 -.3500 .000E+00 .000E+00 .000E+00 CA CH1E .1000 .539E+00 -.135E+01 -.600E-01 H H .2500 .617E+00 .761E+00 .210E-01 C C .5500 .206E+01 -.134E+01 -.810E-01 O O -.5500 .268E+01 -.272E+00 -.510E-01 CB CH3E .0000 .300E-02 -.208E+01 -.129E+01 SUM OF CHARGES .0 BONDS: 6 N -C CA N H N C CA O C CB CA DIHEDRALS: 3 -C N CA C -N -CA -C N -CA -C N CA IMPROPERS: 3 CA N C CB -C -CA N -O N -C CA H INTERNALS N -C -CA -O 1.3290 116.21 -180.00 CA N -C -CA 1.4580 121.61 180.00 H N -C -CA .9800 119.14 .00 C CA N -C 1.5250 111.06 180.00 O C CA N 1.2310 120.84 -.06 CB CA N -C 1.5220 110.15 57.53 The command SHOW TOPOLOGY CLASS lists you all know atom types. 9.4. Inspecting and saving force field parameters - write and show Complete current force field parameters can be saved to a file using a WRITE command: > write parameters bond C O 1480.000 1.231 angle CH2E C5W CR1E 863.744 126.900 dihedral CH1E C N CH1E 1250.00 2 180.00 improper C X X C 75.00 0 .00 nonbonded OH1 .159 2.851 .159 2.851 This is only a selection of written parameters. Each record starts with an identifier (BOND, ANGLE, DIHEDRAL, IMPROPER or NONBONDED) that is followed by the required number of atom CLASSES forming a term, force constant and equilibrium geometry. DIHEDRAL and IMPROPER records have an insertion, after the force constant periodicity of the angle is specified. 'X' atom class means any atom class - this only applies for IMPROPER and DIHEDRAL records. The NONBONDED terms specify van der Waals interaction energies. The first number is interaction energy at the optimal distance (second number) for two equal atoms. From these two numbers interaction constants for VdW interaction energies are calculated for each possible pair of specified atom CLASSES. NONBONDED interaction differentiate between 1-4 interactions and the more distant ones, therefore also two pairs of numbers have to be specified. > show param SHOW> BONDS 71 ANGLES 139 DIHEDRALS 43 IMPROPERS 64 gives you number of force field records. With > show param bond C O bond C O 1480.000 1.231 you can retrieve a single force field term. No wild characters for atom CLASS matches are accepted. 9.5. Using available MAIN topology files After topology library and parameter files have been read, their information needs to be merged into description of a molecular model. Once this is done, you can start ENERGY calculations including POSITIONAL MINIMIZATION of your model. The simplest way is to call the DEF_ALL command file (defined in "MAIN_UTILS:get_top_par_19_csd.com") by clicking a menu item DEFINE or DEFINE_A. No error messages should appear, besides the ones related to N- and C-terminal OT and HT atoms. When it si not so you must find out the reasons and act. You can not expect that the system will behave under MINIMIZATION nicely if it has some undefined force constants or even a couple of atom CLASSES undefined. First the ">utils/def_top_par_19.com" will be described, and then we shall check some quite frequent problems. define class charge by topo sele .not t_anchor .or atom name X* end The first DEFINE command assigns each atom (besides anchors and dummy atoms) its CLASS from topology library entries. For each selected atom program tries to find a match by its RESIDUE and ATOM NAME. Only not found residue entries are reported. set class sele atom name S* .and. by bond atom name S* end S SET CLASS selects only those sulphur atoms included in a disulfide bond arrangement and SETS their CLASS explicitly. So disulfide bridges can be formed implicitly (by a distance criterion) and no explicit knowledge about exact disulfide bond arrangement is required. WARNING: This is a more comfort way, but if you forget to calculate disulfide bonds, VdW forces may pull SG atoms apart and the next time, when you will try to calculate the disilfides they will not be created. defin class charge sele by resi atom name HT* end by topo resi NTER defin class charge sele by resi atom name OT* end by topo resi CTER It is assumed that the hydrogen and oxygen atoms at both chain termini have 'T' as the second character of their atom names. These make possible to implicitly assign NTER-minal and CTER-minal atom definitions with no need to create different segments or chains or even read chains separately as in XPLOR. After atom CLASS assignment has been successfully completed, you may proceed to start creating bonding energy lists. If there are still problems see the section "Frequent topology and parameter assignment problems" below. define bond angle init select .not t_anchor end by auto The BOND and ANGLE lists are created BY AUTOMATIC procedure which builds the lists by using covalent bond connections present in the connectivity table (CTABLE). (What you see is what you get, which means if you will break a covalent bond and redefine the lists, it will not be taken into an energy calculation neither as BOND or as a part of an ANGLE term.) define dihe by topo init select .not t_anchor end define impr by topo init select .not t_anchor end check DIHEDRAL and IMPROPER angles definitions are usually part of the topology library residue entries. Mathematically there is no difference between the two terms. DIHEDRALS are usually conformational angles of 4 atoms (1-2-3-4) describing rotation about the middle covalent bond (2-3). IMPROPERS are defined by 4 atoms as well with the difference that the 4 atoms are not necessarily covalently attached in in the pattern 1-2-3-4. IMPROPERS usually define planarity or chirality of a group. Command word CHECK effects only chiral impropers in the way that it preserves the current chirality. This allows building L or D amino-acids with no need for applying a patch DtoL or vice versa, similar as in XPLOR. If you want to enforce chirality of L amino-acids then remove the CHECK. Additional improper or dihedral angle definitions that are not included in the residue name as for example the C- and N- termini needs to be explicitly defined as follows: define impr dihe by topo select by resi atom name HT* end resi NTER define impr dihe by topo select by resi atom name OT* end resi CTER return The difference bewteen these commands and the higher placed general topology library calls is that here RESIDUE NAMES are not taken from the actually selected atoms, but are explicitly enforced: each selected residue is checked against the specified NTER and CTER DIHEDALS and IMPROPERS lists, and for each match in all four atom names additional DIHEDALS and IMPROPERS is created. When an N-terminal residue is a proline use NTPR insetad of NTER. XPLOR uses the word "nil" to patch a single residue, in MAIN you simply don't need to specify a SYMBOL or NEIGHBOR one character identifier(s). You select a residue and explicitly call a particular topology library entry as shown above for the N- and C- terminal improper and dihedral angle definitions. Any additional entry in the any of the bonding energy lists can be also explicitly created or deleted. See the section "Modifying and creating new topology lists". 9.6. Frequent topology and parameter assignment problems Most of them are taken care with the item xpl2MAIN, menu block UTILS. Warning and error messages initiated after clicking DEFINE are to be considered and the place in the structure inspected one by one. Often they are result of wrongly assigned covalent bonds. 9.6.1. Wrong residue or atom names - rename In cases when a residue name from your structure is not found in topology library try to find out if your residue only needs a different name by comparing atoms from the residue with the appropriate topology library residue entry. If the two describe the same structure, you only have to RENAME your residue. The MAIN Engh & Huber topology library has for example no HIS residue, but a HISH: > rename residue HISH select resi name HIS end If you are not lucky enough continue with the section "Modifying and creating new topology library entries". When an atom does not find a match in equivalent topology library residue you should RENAME it. A usual case is the CD1 atom of isoleucine: > rename atom CD select residue name ILE .and. atom name CD1 end 9.6.2. Missing atoms The most common things are problems with N- and C- termini, however, this is presented in the chapter "Model building". Here you need to know only that you have to create other missing atoms and finally fulfill the model with its termini atoms. The macro "MAIN_UTILS:fill_atoms.com" takes care of the missing atoms. You can invoke it by clicking the item "FILL_ATO" of the menu block UTILS on the menu page 7 ("MAIN_MENU:utils.txt"). This procedure finds all the missing atoms and builds them into the model by keeping the known atoms where they were. WARNING: The atoms that are in your model, but do not match your topology library residue atom list, are deleted. Therefore you are advised to do other checks first. This happens to N- and C- terminal 'HT*' and 'OT*' atoms as well. 9.7. Using available X-PLOR topology files Keyword: X-PLOR XPLOR topology files are not in the form that allows MAIN to build and modify them, You can apply them for energy calculations only. XPLOR topology libary residue has no links or definitions of lists that refer to some other residue. Therefore it uses patches (PRESIDUES) to define the connectivities. The patches have in front if each atom name a symbol ('-', '+', '1', '2' ) which refers to a selection. You can use them in MAIN too, for example the peptide bond patch can be invoked through the "NEIGHBOR - +" command, which applies the - and + for each pair of consecutive residues through the whole selection. > define dihed impr select sequ 1 99 end residue PEPT neighbor - + A more general case is when you want to apply patches to any pair of residues: > define dihedral residue DISU symbol 1 select sequence 1 end \ > symbol 2 select sequence 122 end Once you can do energy calculations with an XPLOR residue description it is straight forward to create MAIN topoology library residue(s). See section "Converting XPLOR topology files to MAIN form" 9.8. Converting XPLOR topology files to MAIN form Keyword: conversion As you may have alreay notice above, there are several differences between the contents of MAIN and XPLOR topology library residue entries, namely MAIN ATOM description includes COORDINATES, MAIN INTERNAl coordinate description has defined referenced distances and angles and referenced atoms in MAIN could be found in a previous residues (using the '-'atom name prefix). If you only READ XPLOR topology libaries and WRITE them in the MAIN format, all these for MAIN important data will be missing or undefined. In order to allow MAIN to fulfill the libraries you must provide the missing data. So read in a coordinate file of the residue, CALCULATE its covalent bonds, MAKE the internal coordinate table (Z-matrix table - ZTABLE), CALCULATE INTERNAL cordinates and DEFINE all necessary energy parameters for it (atomic CLASS, CHARGE and all DIHEDRALS and IMPROPERS). While creating new residues MAIN accesses these data fields and transforms them into a topology library description. Now let us define a new alanine residue (ALA) based on XPLOR topology library description: First read a coordinate file with the ALA residue. The ALA residue should not be the first one in the chain if you want to be able to attached it to some other residues. Therefore create the tables and CALCULATE internal coorinates: > read file ala.pdb coordinates pdb > calculate bond > make ztable from ctable > calculate internal Then read the XPLOR files and DEFINE atom CLASSES, CHARGES and energy lists. (No explicit SELECTION is specified so the default SELECT ALL END is taken.) > read file xplor/toppar/toph19x.pro topo xpl init > read file xplor/toppar/param19x.pro para xpl init > defin class charge by topo > defin bond angle init by auto > defin dihe improper init by topo > defin dihe impr by topo resi PEPT sele all end neigh - + Now perform at leats the minimum check of correctness of the above definitions by an energy calculation: > energy select all end If the energy term values appear OK create the topology library residue and write it to a file. Topology libary is initialized because of two reasons. First, because we usually only want to write the newly created residues and second, because a residue, which is on the list but not the last one, can not be created twice. (A way around this is to change the RESIDUE NAME by a RENAME command.) > topo init > topo append sele resi name ALA end > write file top_ala.main topology Following the just described procedure not only one residue, but also whole XPLOR libraries could be converted. 9.9. Modifying and creating new topology library entries Keyword: residue, nonstandard This is an advanced topic, so do not start here. Here it is assumed that you have basic knowledge about the ENERGY calculations, that you can perform them, and off course, that you know how topology library entries and force field parameters are organized. You should also be able to perform interactive model manipulation and building. For interactive editing of the topology and energy terms you can use "DEFINE" menu block ("MAIN_MENU:define.txt"). In general it is easier to deal with and also more consistent with the already existing definitions if you try to create new entries from the topologically similar residues. (Try to invent as little as possible of new atom types - CLASSES.) The general outline is: - read in or build a 3-dimensional representation of a molecular model, - if atoms are still missing insert them and create also complete list of covalent bonds, - delete the superficial atoms, - if your molecule is too large, split it into smaller residues (this may very probably save you time later on), - if necessary write the molecule in the new residue organization into a file, change the order of the atomic records in the file so that the later buil ZTABLE will have no holes, and re-read the atomic records (from here on it is assumed that you will complete each residue separately), - create ZTABLE and calculate internal coordinates of the residue, - after your molecule is complete RENAME the atoms and define atom types (CLASSES), - assign parameter list to the molecule (undefined terms are guessed), - create the topology residue and assign a correct CLASS to unrecognized atoms (guessed as XXXX), - replace the guessed force field parameters by more suitable ones, add new dihedral and improper angles or modify the guessed ones, - if the guess remains to define DIHEDRAL and IMPROPER angels for energy calculation and redefine the topology library entry of the residue and - finally save the new creations to a file. You are strongly discuraged to create a topology library entry while working with your macromolecule. Start MAIN and only read in the molecule with the yet undefined topology residue together with the topology and parameter libraries you want to expand. 9.9.1. Modifying an existing residue: norleucine So let us first create a norleucine, a residue similar in topology to methionine, which has the sulphur atom replaced with a carbon. So load the menus and topology and parameter files, initialize display and enter the dial mode: > <>cmds/load_depp_page > <>utils/get_top_par_19_csd > image init dial Set the START flag, choose a name for the new segment and a sequence root (click START, SEGMENT and SEQUENCE in any order - page 9) and then create a two residue segment by clicking RESIDUE?? and responding with ALA MET. It is important to start with a residue before MET because only so it will be possible to build chains with the newly created toology library residue entry (as already explained above). Define all energy lists and atom CLASSES and CHARGES by clicking DEFINE. Display all atom NAMES and CLASSES to have their visual control: > image set atom name atom class > image dial Go to page 1, click the SG atom, rename the atom to CD (click REN_ATOM and answer with SD) rename the residue from MET to NOR (click REN_RESI and respond with NOR), go to page 2 to RE_DRAW the image and then to page 4 to and modify the atom clas to CH2E (click INQ_CLASS and respond with CH2E). In the case that you have invented new atom CLASS you should rad it in too as descrived below for the invented force field parameter(s) and finally insert these new CLASS records on the top of newly created topology file with a text editor. Now the data for the new residue are ready so click two norleucine atoms, select this residue into the key 'active' (go to page 9 and click ACT_2RES) and finally create the new entry (go to page 4 and click on MAKE_TOP). Check it with > write topo resi NOR Pay attention to this record which is shows that the bond length CG-CD and CD-CE are unusually long for the sp3 carbon bond length. inte CG CD 1.803 CB 112.956 CA -160.979 inte CE CD 1.791 CD 100.936 CB -99.220 But now the topology is ready to be used so you can redefine the energy term lists, and optimize geometry (go to page 9 and click DEFINE). The error message DEF_ANGL> WARNING: UNDEFINED ANGL 13*CH2E* 14*CH2E* 15*CH3E* will tell you that a parameter for angle CH2E-CH2E-CH3E is missing. Before doing any further steps the missing parameters need to be invented: So look inot force field parameter lists and find the most similar one: > wr param I have chosen the angle CH1E-CH2E-CH3E. This is not entirely correct but it will work good enough I suppose. Now read it MAIN: > read param angle CH2E CH2E CH3E 440.686 113.800 end_of_file WARNING: If you try to exit reading with ctrl-D MAIN get lost. Type in the string end_of_file to get the MIAN prompt back. Now you can press DEFINE and no error messages will appear so the norleucine geometry can be optimized now. At this point it makes sense to use only the bonding terms (BOND, ANGLES, DIHEDRALS and IMPROPERS) so turn the VDW and ELECTROSTATIC term before off, and then MINIMIZE it (click ENE_VDW and ENE_ELEC and then MINIMIZE). Correct the internal geometry of the residue by recalculating the interatomic distances, angles and dihedrals from the current coordinates and then either click MAKE_TOP in the dialog mode again or type the commad explicitly: > calc internal > topology append select active end What remains is to save the new residue entry to a file. > write file top_19_nor.main topology resid NOR Do not forget that also an angle parameter term was missing, so create a file for the missing force field parameter term(s) too, and read it next time after the standard parameter files. (The results of this procedure are the files '>top/top_19_nor.main and '>top/par_19_nor.main'.) 9.9.2. Create a new entry from atom coordinates Keyword: residue, entry, new topology A quick and dirty way is to read it, run a DEFINE and do energy calculations. You should be aware though that - the assigned atom class XXXX means a carbon atom for nonbonding interactions, - covalent bond and angles target values are assigned from the current interatomic distances - angles and are held together with moderate harmonic forces, - around each ring in a residue dihedral angles preserve its planarity (aromatic or sp2 character), - each atom with 3 neighbors is enforced, by an assigned improper angle, to remain either planar (sp2) or form a chiral center (sp3) - depending on the currently found geometry. At this poin you should define a topology library entry and then from it in steps generate the final version: - select the residue into the key active - click MAKE_TOP on page 4 - write file with the "new" residue entry. > write file xx topo resi new If you want to complete the job more properly, define the atom classes. You have to possibilities. Either to write the topology entry to a file and replace each XXXX with an appropriate letter code and read it back or click an atom at a time and then click INQ_CLASS (menu page 4) and respond with an appropriate answer and finally save the topology file. After this is completed click DEFINE once more. A list of warning messages will probably show up. Each missing energy term (BONDS, ANGLES, DIHEDRALS and IMPROPERS) requires an explicit entry into the parameter file. So use an text editor and insert each missing term in the form shown above. A good guess can be found in already available parameter files by trying to find the most similar terms. This is it. The next time (to use the new topology entry and parameters) read the created files immediately after reading the default libaries.