Topology and parameter files

In this chapter you can learn how to use available topology and force field parameter files and how to create your own ones. The files are stored in the MAIN:top_par/ directory. Topology files are based on residue(s) entries, which are composed off records based on atom names. Parameter files include force field terms, which are based on atom type (CLASS) entries.

Only use of MAIN topology files enables you to build molecular model from scratch and (or) modify atomic composition of your molecular model and later perform ENERGY calculations.

With the MAIN 97 release it became possible to geometrically optimize any molecular model. The missing parameters for unknown residues and atoms are guessed from the current model. For further details you are reffered to the section "Modifying and creating new topology library entries" at the end of the chapter.

X-PLOR topology and parameters can be used only for ENERGY calculations. You can not use them to build molecular models from scratch or modify their topology. You can however any time convert X-PLOR files into MAIN files. See the section "Converting XPLOR topology files to MAIN form".

The section "Reading MAIN topology and parameter files" brings you how to read (load) the already available libraries. The section "Inspecting and saving topology library records - write" explains how to save whole or partial entries to a file and what do the records mean. The section "Inspecting topology library records - show" explains how to use the SHOW command to inspect the contens of the loaded entries. The section "Inspecting and saving force field parameters - write and show" describes how to inspect and save the parameter records and which information is included in these records. The section "Using available MAIN topology files" explains how to extract and use the necessary information for model building and energy calculations. The section "Frequent topology and parameter assignment problems" brings to you how to adjust your residue and atom name descriptions to a loaded topology library. The section "Using available X-PLOR topology files" explains you how to use X-PLOR topology files. The section "Converting XPLOR topology files to MAIN form" explains how to create MAIN topology files out of X-PLOR files so that they are applicable for model building. The section "Modifying and creating new topology library entries" explains how to create new topology library entries (residues) by adapting already existing residues or generate a completely new residues.

Reading MAIN topology and parameter files

The simplest way is to call a "get_top_par....com" command macro (with "@" or "<") which reads appropriate topology and parameter files and specifies the macro name (the character variable DEF_ALL) invoked by clicking DEFINE or DEFINE_A to redefine atoms types (CLASSES), atomic CHARGES, BOND, ANGLE, DIHEDRAL and IMPROPER angle lists. If everything is fine with your case and you do not want to know how topology assignemnt works, you can stop here.

The MAIN_UTILS:get_top_par_19_csd.com macro reads the Engh& Huber topology and parameter files for amino acid residues including necessary C- and N- termini definitions plus a solvent (H2O) molecule definition, and defines the value of the variable DEF_ALL.


> read file >top/top_19_csd.main top main
> read file >top/par_19_csd.main par init
> set vari DEF_ALL = >utils/def_top_par_19.com
> return

Inspecting and saving topology library records - write

Now you can see what the program has read in by typing


> write topology

WRITE TOPOLOGY writes the whole topology library. You can limit the written information to a residue by specifying it:


> write topology residue ALA

The output now is shorter, it lists all known CLASSes and then only the information related to the alanine residue topology. Each residue record starts with the word "residue" followed by the residue abbreviation code "ALA" (up to 4 characters long). Currently each residue can consist only of a single "group".


 residue ALA
 group
 atom N clas=NH1 charge=-.350 coor .000 .000 .000
 atom CA clas=CH1E charge=.100 coor .539 -1.353 -.060
 atom H clas=H charge=.250 coor .617 .761 .021
 atom C clas=C charge=.550 coor 2.063 -1.335 -.081
 atom O clas=O charge=-.550 coor 2.682 -.272 -.051
 atom CB clas=CH3E charge=.000 coor .003 -2.077 -1.287

The "atom" records include ATOM NAME (up to 4 characters are read). Each atom name must be unique within a residue in order to allow unique atom differentiation. Number of atom records per residue is not limited, however, the total sum of all atoms present in the topology library is limited.

The "clas" identifier is atom type assigned to the atom. It is used to correctly assign forcefield parameters to its bonding and nonbonding terms.

Atomic partial "charge" can be any real number (take care when inventing your own.)

The "coor-dinates" data are used only when MAIN starts to build a new segment. These numbers are actual coordinates of atoms at the time of creation of this topology library, specified relatively to the first atom in the residue.


 bond N -C bond CA N bond H N bond C CA bond O C
 bond CB CA

The "bond" records list all the covalent bonds within the residue. MAIN topology file in contrary to the X-PLOR convention allows assignment of bonds also to the atoms of a previous residue based on a convetion that the '-' sign relates to the previous residue. The bond records are usually used only for model building, while the energy bond lists are created from the connectivity table automatically.


 dihe -C N CA C dihe -N -CA -C N
 dihe -CA -C N CA


 impr CA N C CB impr -C -CA N -O impr N -C CA H

The "dihedral" and "improper" angle records list DIHEDRAL and IMPROPER angle definitions needed for energy calculations. Again, as by bonds, the '-' sign means that the related atoms are to be found in a previous residue. So it is possible to avoid use of XPLOR "patches" for the peptide chain when generating all bonding lists.


 inte N -C 1.329 -CA 116.214 -O -180.000
 inte CA N 1.458 -C 121.609 -CA 180.000
 inte H N .980 -C 119.139 -CA .000
 inte C CA 1.525 N 111.063 -C 180.000
 inte O C 1.231 CA 120.839 N -.059
 inte CB CA 1.522 N 110.147 -C 57.525

The "internal" coordinate records include distance of the record atom to its neighbor, angle to the neighbor of the neighbor and an dihedral angle to one more atom behind. These records are used to attached residues to a previously defined residue.


 end

Each residue definition ends with a record consisting of "end".

Inspecting topology library records - show

A different way of inspecting a topology library is by using SHOW TOPO commands. For the SHOW commands the rule, what can be written can be also READ, does not apply.


> show topo resi *

The SHOW TOPO alone gives you the number of entries in the topology library, togethre with RESIDUE * lists all short names of available residues. The '*' means any residue name will match, 'H*' means any residue starting with 'H' will match and 'ALA' mean only ALA residue name matches.

Complete information about a topology library residue can be obtained with the following command


> show topology resi ALA atom * all bond * dihe * impr * inte *

where the '*' charcter stands for any atom. The selection of matching atoms can be reduced by supplying characters from atom names. Any combination of items (BOND, DIHEDRAl, IMPROPER, ATOM, INTERNAL) is acceptable. When only instepcting a topology residue description you may prefer the SHOW TOPOLOGY commands, since outpu is formated. The following output is result of the above command:


SHOW> TOTAL NUMBER OF TOPOLOGY RESIDUES KEPT   28
     1 RESIDUE ALA  ATOMS:   6 KIND: r
 ATOMS
 N    NH1    -.3500    .000E+00    .000E+00    .000E+00
 CA   CH1E    .1000    .539E+00   -.135E+01   -.600E-01
 H    H       .2500    .617E+00    .761E+00    .210E-01
 C    C       .5500    .206E+01   -.134E+01   -.810E-01
 O    O      -.5500    .268E+01   -.272E+00   -.510E-01
 CB   CH3E    .0000    .300E-02   -.208E+01   -.129E+01
  SUM OF CHARGES .0
 BONDS: 6
 N    -C
 CA   N
 H    N
 C    CA
 O    C
 CB   CA
  DIHEDRALS: 3
 -C   N    CA   C
 -N   -CA  -C   N
 -CA  -C   N    CA
  IMPROPERS: 3
 CA   N    C    CB
 -C   -CA  N    -O
 N    -C   CA   H
 INTERNALS
  N    -C   -CA  -O    1.3290  116.21 -180.00
  CA   N    -C   -CA   1.4580  121.61  180.00
  H    N    -C   -CA    .9800  119.14     .00
  C    CA   N    -C    1.5250  111.06  180.00
  O    C    CA   N     1.2310  120.84    -.06
  CB   CA   N    -C    1.5220  110.15   57.53

The command SHOW TOPOLOGY CLASS lists you all know atom types.

Inspecting and saving force field parameters - write and show

Complete current force field parameters can be saved to a file using a WRITE command:


> write parameters


 bond    C       O    1480.000    1.231
 angle    CH2E    C5W     CR1E  863.744  126.900
 dihedral    CH1E    C       N       CH1E  1250.00    2   180.00
 improper    C       X       X       C       75.00    0      .00
 nonbonded    OH1      .159    2.851     .159    2.851

This is only a selection of written parameters. Each record starts with an identifier (BOND, ANGLE, DIHEDRAL, IMPROPER or NONBONDED) that is followed by the required number of atom CLASSES forming a term, force constant and equilibrium geometry. DIHEDRAL and IMPROPER records have an insertion, after the force constant periodicity of the angle is specified. 'X' atom class means any atom class - this only applies for IMPROPER and DIHEDRAL records.

The NONBONDED terms specify van der Waals interaction energies. The first number is interaction energy at the optimal distance (second number) for two equal atoms. From these two numbers interaction constants for VdW interaction energies are calculated for each possible pair of specified atom CLASSES. NONBONDED interaction differentiate between 1-4 interactions and the more distant ones, therefore also two pairs of numbers have to be specified.


> show param
SHOW> BONDS  71   ANGLES 139   DIHEDRALS  43   IMPROPERS  64

gives you number of force field records. With


> show param bond C O
 bond C O 1480.000 1.231

you can retrieve a single force field term. No wild characters for atom CLASS matches are accepted.

Using available MAIN topology files

After topology library and parameter files have been read, their information needs to be merged into description of a molecular model. Once this is done, you can start ENERGY calculations including POSITIONAL MINIMIZATION of your model.

The simplest way is to call the DEF_ALL command file (defined in MAIN_UTILS:get_top_par_19_csd.com) by clicking a menu item DEFINE or DEFINE_A. No error messages should appear, besides the ones related to N- and C-terminal OT and HT atoms. When it si not so you must find out the reasons and act. You can not expect that the system will behave under MINIMIZATION nicely if it has some undefined force constants or even a couple of atom CLASSES undefined.

First the >utils/def_top_par_19.com will be described, and then we shall check some quite frequent problems.


 define class charge by topo sele .not t_anchor .or atom name X* end

The first DEFINE command assigns each atom (besides anchors and dummy atoms) its CLASS from topology library entries. For each selected atom program tries to find a match by its RESIDUE and ATOM NAME. Only not found residue entries are reported.


 set class sele atom name S* .and. by bond atom name S* end S

SET CLASS selects only those sulphur atoms included in a disulfide bond arrangement and SETS their CLASS explicitly. So disulfide bridges can be formed implicitly (by a distance criterion) and no explicit knowledge about exact disulfide bond arrangement is required. WARNING: This is a more comfort way, but if you forget to calculate disulfide bonds, VdW forces may pull SG atoms apart and the next time, when you will try to calculate the disilfides they will not be created.


 defin class charge sele by resi atom name HT* end by topo resi NTER
 defin class charge sele by resi atom name OT* end by topo resi CTER

It is assumed that the hydrogen and oxygen atoms at both chain termini have 'T' as the second character of their atom names. These make possible to implicitly assign NTER-minal and CTER-minal atom definitions with no need to create different segments or chains or even read chains separately as in XPLOR.

After atom CLASS assignment has been successfully completed, you may proceed to start creating bonding energy lists. If there are still problems see the section "Frequent topology and parameter assignment problems" below.


 define bond angle init select .not t_anchor end by auto

The BOND and ANGLE lists are created BY AUTOMATIC procedure which builds the lists by using covalent bond connections present in the connectivity table (CTABLE). (What you see is what you get, which means if you will break a covalent bond and redefine the lists, it will not be taken into an energy calculation neither as BOND or as a part of an ANGLE term.)


 define dihe by topo init select .not t_anchor end
 define impr by topo init select .not t_anchor end check

DIHEDRAL and IMPROPER angles definitions are usually part of the topology library residue entries. Mathematically there is no difference between the two terms. DIHEDRALS are usually conformational angles of 4 atoms (1-2-3-4) describing rotation about the middle covalent bond (2-3). IMPROPERS are defined by 4 atoms as well with the difference that the 4 atoms are not necessarily covalently attached in in the pattern 1-2-3-4.

IMPROPERS usually define planarity or chirality of a group. Command word CHECK effects only chiral impropers in the way that it preserves the current chirality. This allows building L or D amino-acids with no need for applying a patch DtoL or vice versa, similar as in XPLOR. If you want to enforce chirality of L amino-acids then remove the CHECK.

Additional improper or dihedral angle definitions that are not included in the residue name as for example the C- and N- termini needs to be explicitly defined as follows:


 define impr dihe by topo select by resi atom name HT* end resi NTER
 define impr dihe by topo select by resi atom name OT* end resi CTER
 return

The difference bewteen these commands and the higher placed general topology library calls is that here RESIDUE NAMES are not taken from the actually selected atoms, but are explicitly enforced: each selected residue is checked against the specified NTER and CTER DIHEDALS and IMPROPERS lists, and for each match in all four atom names additional DIHEDALS and IMPROPERS is created. When an N-terminal residue is a proline use NTPR insetad of NTER.

XPLOR uses the word "nil" to patch a single residue, in MAIN you simply don't need to specify a SYMBOL or NEIGHBOR one character identifier(s). You select a residue and explicitly call a particular topology library entry as shown above for the N- and C- terminal improper and dihedral angle definitions.

Any additional entry in the any of the bonding energy lists can be also explicitly created or deleted. See the section "Modifying and creating new topology lists".

Frequent topology and parameter assignment problems

Most of them are taken care with the item xpl2MAIN, menu block UTILS. Warning and error messages initiated after clicking DEFINE are to be considered and the place in the structure inspected one by one. Often they are result of wrongly assigned covalent bonds.

Wrong residue or atom names - rename

In cases when a residue name from your structure is not found in topology library try to find out if your residue only needs a different name by comparing atoms from the residue with the appropriate topology library residue entry. If the two describe the same structure, you only have to RENAME your residue. The MAIN Engh& Huber topology library has for example no HIS residue, but a HISH:


> rename residue HISH select resi name HIS end

If you are not lucky enough continue with the section "Modifying and creating new topology library entries".

When an atom does not find a match in equivalent topology library residue you should RENAME it. A usual case is the CD1 atom of isoleucine:


> rename atom CD select residue name ILE .and. atom name CD1 end

Missing atoms

The most common things are problems with N- and C- termini, however, this is presented in the chapter "Model building". Here you need to know only that you have to create other missing atoms and finally fulfill the model with its termini atoms.

The macro MAIN_UTILS:fill_atoms.com takes care of the missing atoms. You can invoke it by clicking the item "FILL_ATO" of the menu block UTILS on the menu page 7 (MAIN_MENU:utils.html).

This procedure finds all the missing atoms and builds them into the model by keeping the known atoms where they were. WARNING: The atoms that are in your model, but do not match your topology library residue atom list, are deleted. Therefore you are advised to do other checks first. This happens to N- and C- terminal 'HT*' and 'OT*' atoms as well.

Using available X-PLOR topology files

XPLOR topology files are not in the form that allows MAIN to build and modify them, You can apply them for energy calculations only. XPLOR topology libary residue has no links or definitions of lists that refer to some other residue. Therefore it uses patches (PRESIDUES) to define the connectivities. The patches have in front if each atom name a symbol ('-', '+', '1', '2' ) which refers to a selection. You can use them in MAIN too, for example the peptide bond patch can be invoked through the "NEIGHBOR - +" command, which applies the - and + for each pair of consecutive residues through the whole selection.


> define dihed impr select sequ 1 99 end residue PEPT neighbor - +

A more general case is when you want to apply patches to any pair of residues:


> define dihedral residue DISU symbol 1 select sequence 1 end \
> symbol 2 select sequence 122 end

Once you can do energy calculations with an XPLOR residue description it is straight forward to create MAIN topoology library residue(s). See section "Converting XPLOR topology files to MAIN form"

Converting XPLOR topology files to MAIN form

As you may have alreay notice above, there are several differences between the contents of MAIN and XPLOR topology library residue entries, namely MAIN ATOM description includes COORDINATES, MAIN INTERNAl coordinate description has defined referenced distances and angles and referenced atoms in MAIN could be found in a previous residues (using the '-'atom name prefix).

If you only READ XPLOR topology libaries and WRITE them in the MAIN format, all these for MAIN important data will be missing or undefined. In order to allow MAIN to fulfill the libraries you must provide the missing data.

So read in a coordinate file of the residue, CALCULATE its covalent bonds, MAKE the internal coordinate table (Z-matrix table - ZTABLE), CALCULATE INTERNAL cordinates and DEFINE all necessary energy parameters for it (atomic CLASS, CHARGE and all DIHEDRALS and IMPROPERS). While creating new residues MAIN accesses these data fields and transforms them into a topology library description.

Now let us define a new alanine residue (ALA) based on XPLOR topology library description:

First read a coordinate file with the ALA residue. The ALA residue should not be the first one in the chain if you want to be able to attached it to some other residues. Therefore create the tables and CALCULATE internal coorinates:


> read file ala.pdb coordinates pdb
> calculate bond
> make ztable from ctable
> calculate internal

Then read the XPLOR files and DEFINE atom CLASSES, CHARGES and energy lists. (No explicit SELECTION is specified so the default SELECT ALL END is taken.)


> read file xplor/toppar/toph19x.pro topo xpl init
> read file xplor/toppar/param19x.pro para xpl init
> defin class charge by topo
> defin bond angle init by auto
> defin dihe improper init by topo
> defin dihe impr by topo resi PEPT sele all end neigh - +

Now perform at leats the minimum check of correctness of the above definitions by an energy calculation:


> energy select all end

If the energy term values appear OK create the topology library residue and write it to a file. Topology libary is initialized because of two reasons. First, because we usually only want to write the newly created residues and second, because a residue, which is on the list but not the last one, can not be created twice. (A way around this is to change the RESIDUE NAME by a RENAME command.)


> topo init
> topo append sele resi name ALA end
> write file top_ala.main topology

Following the just described procedure not only one residue, but also whole XPLOR libraries could be converted.

Modifying and creating new topology library entries

This is an advanced topic, so do not start here. Here it is assumed that you have basic knowledge about the ENERGY calculations, that you can perform them, and off course, that you know how topology library entries and force field parameters are organized. You should also be able to perform interactive model manipulation and building.

For interactive editing of the topology and energy terms you can use "DEFINE" menu block (MAIN_MENU:define.html).

In general it is easier to deal with and also more consistent with the already existing definitions if you try to create new entries from the topologically similar residues. (Try to invent as little as possible of new atom types - CLASSES.) The general outline is:

You are strongly discuraged to create a topology library entry while working with your macromolecule. Start MAIN and only read in the molecule with the yet undefined topology residue together with the topology and parameter libraries you want to expand.

Modifying an existing residue: norleucine

So let us first create a norleucine, a residue similar in topology to methionine, which has the sulphur atom replaced with a carbon. So load the menus and topology and parameter files, initialize display and enter the dial mode:


> <>cmds/load_depp_page
> <>utils/get_top_par_19_csd
> image init dial

Set the START flag, choose a name for the new segment and a sequence root (click START, SEGMENT and SEQUENCE in any order - page 9) and then create a two residue segment by clicking RESIDUE?? and responding with ALA MET. It is important to start with a residue before MET because only so it will be possible to build chains with the newly created toology library residue entry (as already explained above). Define all energy lists and atom CLASSES and CHARGES by clicking DEFINE. Display all atom NAMES and CLASSES to have their visual control:


> image set atom name atom class
> image dial

Go to page 1, click the SG atom, rename the atom to CD (click REN_ATOM and answer with SD) rename the residue from MET to NOR (click REN_RESI and respond with NOR), go to page 2 to RE_DRAW the image and then to page 4 to and modify the atom clas to CH2E (click INQ_CLASS and respond with CH2E). In the case that you have invented new atom CLASS you should rad it in too as descrived below for the invented force field parameter(s) and finally insert these new CLASS records on the top of newly created topology file with a text editor.

Now the data for the new residue are ready so click two norleucine atoms, select this residue into the key 'active' (go to page 9 and click ACT_2RES) and finally create the new entry (go to page 4 and click on MAKE_TOP). Check it with


> write topo resi NOR

Pay attention to this record which is shows that the bond length CG-CD and CD-CE are unusually long for the sp3 carbon bond length.


 inte CG CD 1.803 CB 112.956 CA -160.979
 inte CE CD 1.791 CD 100.936 CB -99.220

But now the topology is ready to be used so you can redefine the energy term lists, and optimize geometry (go to page 9 and click DEFINE). The error message


DEF_ANGL> WARNING: UNDEFINED ANGL   13*CH2E*   14*CH2E*   15*CH3E*

will tell you that a parameter for angle CH2E-CH2E-CH3E is missing. Before doing any further steps the missing parameters need to be invented: So look inot force field parameter lists and find the most similar one:


> wr param

I have chosen the angle CH1E-CH2E-CH3E. This is not entirely correct but it will work good enough I suppose. Now read it MAIN:


> read param
 angle    CH2E    CH2E    CH3E  440.686  113.800
 end_of_file

WARNING: If you try to exit reading with ctrl-D MAIN get lost. Type in the string end_of_file to get the MIAN prompt back. Now you can press DEFINE and no error messages will appear so the norleucine geometry can be optimized now. At this point it makes sense to use only the bonding terms (BOND, ANGLES, DIHEDRALS and IMPROPERS) so turn the VDW and ELECTROSTATIC term before off, and then MINIMIZE it (click ENE_VDW and ENE_ELEC and then MINIMIZE).

Correct the internal geometry of the residue by recalculating the interatomic distances, angles and dihedrals from the current coordinates and then either click MAKE_TOP in the dialog mode again or type the commad explicitly:


> calc internal
> topology append select active end

What remains is to save the new residue entry to a file.


> write file top_19_nor.main topology resid NOR

Do not forget that also an angle parameter term was missing, so create a file for the missing force field parameter term(s) too, and read it next time after the standard parameter files. (The results of this procedure are the files '>top/top_19_nor.main and '>top/par_19_nor.main'.)

Create a new entry from atom coordinates

A quick and dirty way is to read it, run a DEFINE and do energy calculations.

You should be aware though that

At this poin you should define a topology library entry and then from it in steps generate the final version:


> write file xx topo resi new

If you want to complete the job more properly, define the atom classes. You have to possibilities. Either to write the topology entry to a file and replace each XXXX with an appropriate letter code and read it back or click an atom at a time and then click INQ_CLASS (menu page 4) and respond with an appropriate answer and finally save the topology file. After this is completed click DEFINE once more.

A list of warning messages will probably show up. Each missing energy term (BONDS, ANGLES, DIHEDRALS and IMPROPERS) requires an explicit entry into the parameter file. So use an text editor and insert each missing term in the form shown above. A good guess can be found in already available parameter files by trying to find the most similar terms.

This is it.

The next time (to use the new topology entry and parameters) read the created files immediately after reading the default libaries.