About MAIN, its structure and syntax rules

MAIN is a computational environment used to solve 3-dimenzional structures of proteins by X-ray crystallography. After the phase prolem has been solved by emeans of molecular replacement or heavy atom derivatives techniques a user can enter MAIN and complete a structure.

MAIN philosophy

Interactivity principle

MAIN is primarily an interactive program, and a program becomes interactive when the answers can be obtained before the questions are forgotten.

Two levels of user interface

The first level expect from a user to be able to use "main_config" scipts to create macros and run the program (see MAIN_DOC:1mol/1mol.html and MAIN_DOC:nmol/nmol.txt") and beineg able to click at the right menu items in the depp pages "MAIN_MENU:strikes.txt". This level enables a user to deal with the usual cases"usual cases.

The next level is to use a text editor to create and modify change parameters (variable values) in existing macros. most crystallographic tasks.

The third level expects from a user to use command syntax to write his own commands and macros for the cases which are not covered by the configuration tools.

Interfaces to other programs

MAIN is able to read and write files from many other programs (also because MAIN reads and writes almost exclusively only ASCII files).

As default for atomic coordinate files the PDB format is taken. Whereas the "use_mtz.pl" script extract the crystal data (space group, cell constants, fobs, phase information) from a mtz file.

See also description of "read" (MAIN_COM:read.html) and "write (MAIN_COM:write.html) commands.

Data overview

An individual human being can only possess an overview of a limited set of data. A column of 20 numbers can be read, however a column of 100 or more numbers is already too much. There are two basic approaches presenting large amounts of data; either to use numerical analysis and reduce the data to some reasonable set of numbers, or to use graphical presentation. Graphical presentation by itself has also some limitations, since we are able to distinguish only between a limited set of geometric elements and colors. MAIN provides tools that can handle large quantities of data and present the extracted information in a variety of different forms.

Error recovery and safety

Every program and computer crashes from time to time, users' mistakes can not be prevented. It is nothing more anoying than repeating a session which includes quite some time and labor.

In order to keep you 'cool', MAIN when running, keeps a complete track of the whole working session in the 'input.cop' file. You can edit this file and extract a session or its part(s) and use it as a command file.

You should follow some safety rules in order to allow MAIN to give you full error recovery:

Rules of command syntax

The program interprets command sentences that are typed in from the keyboard, read from a file (command file) or written by the program itself when running in the DIALOG mode.

Each command sentence must begin with one of the command words.

The exclamation mark "!" is used as a comment sign. The program will not try to interpret characters following an exclamation mark.

A hyphen "-" (VMS like) or backslash "\" (UNIX like) continue the command sentence onto the next line.

Typing '?' or gives you the list of possible command words.

Typing 'HELP' following or preceeding a command word retrieves information form the command reference manual. This is now almost fully operational through the whole program.

The dimension of the command sentence is limited to 2000 characters and 400 command sentence constituents.

All command sentences read on the top input level are stored in the file :INPUT.COP"that is occupying the FORTRAN unit 80. When a command is read from a file (command file) on a disk, then the program does not copy its content to the file"INPUT.COP"but only the call. Up to 10 levels call for command files are allowed by the program"10 levels call for command files are allowed by the program.

"FORTRAN" units higher than 80 should not be "OPENED" since they may be used by the program, however, you can "REWIND" and "CLOSE" them.

Command sentence

MAIN command sentence is a combination of command groups and subgroups. Each command (sub)group starts with one or more command words and can include arguments (integer and real numbers, atom indices and strings). Command words and arguments must be separated by spaces. Each command sentence begins with a command word. A typed question mark '?' on the place, where a command word is expected, gives a list of possible command words.

Command word

Command words can be written in small or capital letters. When a command word is expected then MAIN compares character by character of the found string with the list of possible command words. Maximal 8 characters are checked.

Command words can be abbreviated, but must still be unambigously identified. "QUIT" for example can be abbreviated to a single letter "Q", but not to "QQ".

When a command word is uniquely identified, the remaining up to 8 characters are checked as well to confirm the correctness.

In the manual command words are written in capital letters.

Integer number

Integer can be any integer number (1, -56, 3765, 0 ). The explicit plus '+' sign is not understand. When an integer value makes no sense or may ever result in an error, MAIN rejects also correctly typed integer numbers. An integer number can be also a result of an arithmetic expression where a list of integers is combined with the operators "+", "-", "*" and "/" .

In the manual integer numbers are denoted as "inte", "inte1", "int2", ...

Atom number

Atoms can be assigned by their absolute or relative indices (NUMBER). Absolute index is the atom position in the current atom list. Relative index is related to the position in the history list. History list is created by picking atoms from the image (IMAGE HIST PICK atom-num). The last PICKED atom is always the first in the HISTORY list. Absolute atom NUMBERS are integers. Relative numbers start with a dollar sign "$" preceeding an integer index.

In the manual atom numbers are denoted as "atom-num" or "int".

Real number

Any real number is accepted. The use of exponents is not allowed. Also a real number can be also a result of an arithmetic expression where a list of reals is combined with the operators '+', '-', '*' and '/' .

String

String can be any combination of up to 80 characters. When spaces are suppose to be included in the string, the string should be delimited with two double quotes (``), one at the beginning and the other at the end.

Strings can be merged together with a '+' sign.

Variables

Each argument (integer or real number and string) can be parsed to the MAIN command sentence interpreter via a variable. Each variable has a name and value. The variable name can be any up to 10 characters long string. The names are case sensitive. The variable value can be of the following types: INTEGER or REAL number or a CHARACTER string. MAIN recognizes the variable type from the argument. When the argument in a SET VARIABLE sentence is an integer number then the variable is an integer variable, when it is a real number, then the variable becomes a real variable, and when it is neither an integer nor a real number, then the variable becomes a character variable. Once a variable name has been exploited, it is not possible anymore to redefine its type (real, integer and string). The type can be also explicitly set. See MAIN_COM:set.html.

Variables can be GLOBAL as well as LOCAL. GLOBAL variables are visible at any input level, wheras the LOCAL variables exist only within a certain macro and are forgotten at the moment when the command parser hits the RETURN command.

Variables can be inspected using the "SHOW VARIABLE" commands. You can inspect selected variables by providing their names. Wild sign is "*" is also the default. So omitting a variable name from a SHOW VARIABLE command lists them all.

Some variable names are used and modified by the program automatically (natom, nsegm, ...) expessing some program values or states, whereas the others are user or macro controlled. Quite useful are the RESULT* and IRESULT* variables, which are created with a SHOW command and allow a user to access various data counters or values:


> show image center
SHOW> CENTER POINT:        .0000     .0000     .0000
> show vari RESULT*
 REAL      VARIABLE  6 GLOBAL {RESULT_0  }   .0000000E+00
 REAL      VARIABLE  7 GLOBAL {RESULT_1  }   .0000000E+00
 REAL      VARIABLE  8 GLOBAL {RESULT_2  }   .0000000E+00

For more you are referred to the "Reference manual", chapter "SET" section "VARIABLE".

Selections and keys

Each atom has in each selection a flag that tells if that atom is selected or not. The command sentence does something with the selected atoms. Selections are defined with "SELECT ... END" command subsentence, which on the basis of logical operations merges various atomic properties into a single selection. The "SELECT" subsentence can be included in almost any command sentence that is related to atoms.

Macros

Macro is a file usually stored on a disk which is called from the top MAIN input level or from another macro. Macros can be nested. Up to 10 levels of depth are supported. Each macro can have local variables defined and accept parameters from a previous input level and treat them as local. The passed parameters are interpeted via a "SUBROUTINE" command.


 subroutine int I1 real C1 char String

The parameters are passed using the "by value" mechanism. The conversion of parameter types takes place. "1" can be understood as a real, integer or character value, "1.2" also, however it is trunceted to "1" when tranformed into an integer. "A" can only be a character type variable.

This means that when a parameter value is modified within a macro the modification will not effect the variable on a higher level.

A "RETURN" copmmand returns command to the previous level.

A macro is called using the UNIX redirect "<"' or the VMS "@" characters. A "RETURN" also closes the file.

When a macro is invoked through combination of "OPEN UNIT" and "STREAM" commands then a return does not close the file and a next STREAM command continues parsing the macro after the line of a previous "RETURN". (Used for demos and for browsing through AMoRe molecular replacement solutions file.)

The ".com" is the default MAIN macro extension and need not to be specified. The following commands mean the same:


> <read
> <read.com

Building loops

Loops can be programed with a help of a "REWIND" command which returns the command line parsing to the first line of a macro. Loops need a RETURN which should be conditioned by a loop counter or some other condition. A simple loop (loop.com) looks like this


<loop 0 10


 subroutine int	I int LIMIT
 set vari I = I + 1
 if ( I .ge. LIMIT ) return
 show vari i
 rewind file

This macro will 10 times increase the value of variable "I" and then return command parsing to the previous level.

Short description of MAIN utilities

MAIN is the very distinct program for macromolecular crystallography that enables a user to interactively build molecular models into electron density maps and, within the same interactive session, immediatelly refine the built models with crystallographic restraints and recalculate electron density maps for further inspection.

The program is a tool for numerical and visual analysis and modification of molecular data. With MAIN it is possible to construct, modify and analyse 3-D molecular models of large and small molecular systems up to several thousands of atoms. It can be used in model building and analyzing studies, in crystallographic refinement (including positional refinement, R-factor calculation,and placing solvent molecules at the final stages of refinement) and as an electron density maps editor (rotations, averaging, solvent flattening in real and reciprocal space, Fourier transformations ...),... MAIN includes a unit cell generating routine that builds the cell from an asymmetric unit. The asymmetric unit can be placed anywhere in space. The routine is general for any space group, since the equivalent points are calculated applying crystal symmetry operations, and are not generated by applying rules that are different for each symmetry group and position of an asymmetric unit.

MAIN is designed to allow rapid and easy access to any molecular model data using a single program, thus avoiding the confusion and incompatibilities which arise when using a variety of different programs. Each action undertaken by the user that changes the molecular model data is stored in a file (input.cop), allowing easy restoration after a computer or program crash or a user's mistake.

MAIN is driven by command sentences and from a MENU. It has its own language resembling VMS syntax. All files that are read or written by the program are ASCII files, and therefore transferable to any computer. MAIN can be used together with molecular mechanics programs used for energy calculations and X-ray refinement (CHARMM, X-PLOR, GROMOS, EREF), and those for quantum mechanical calculations (GAUSSIANnn, AMPAC). MAIN utilizes X-PLOR force fields for energy minimization. X-PLOR, PROTEIN and CCP4 (ASCII format) electron density maps can be displayed, analysed, modified and transomed. Electron density from any number of local symmetry elements and any number of different crystal forms can be manipulated (averaged, combined, ...).

Molecular models can be built from scratch using the available topology libraries. Topology libraries can be created from coordinates of molecular models. The geometry of molecular models can be modified using command sentences, which apply translations and rotations, including deorthogonalization, scaling and crystal symmetry operations. Molecular models can be superimposed and compared on the display, and some numerical analysis can also be performed.

The interactively driven geometry changes of a molecular model distinguish MAIN from most other programs. Translations and rotations can be combined with bond rotations (torsions). There can be up to 30 different transformations active, which can be combined in any hierarchical order. Several chains of atoms can be rotated about their bonds simultaneously. Interatomic distances, angles and dihedrals can be interactively displayed while changing the geometry of a model. Manual intervention with a molecular model can be combined with energy minimization procedures, which includes besides the usual bond, angle, dihedral and improper angle, Van der Waals and electrostatic energy terms, also crystallographic restraints, electron density map correlation terms, distance and dihedral angle constraints. Beside the usual ball and stick models, Connolly surfaces and electron density maps can be displayed. Electron density maps are defined in the whole space (based on a P1 unit cell).

A Ramachandran plot of a protein can be displayed and used to monitor the $\phi$ and $\psi$ angles of a residue as its conformation is changed. Correlation of a molecular model with electron density maps, electron density histograms, temperature factor diagrams, molecular surface distributions, consistency of force field parametrisation regarding discrepancies between ideal and actual geometry, hydrogen bonds etc... can be numerically and graphically (excluding force fiels analyses and map histograms) presented.

To be able to use MAIN, a brief overview of its data structures and the possibilities to change and convert them is necessary.

MAIN data structures

What I call a data structure is a group of data fields within the program related by an index to a term that describes an object. An object might be either a synonym for an atom, residue, electron density map, covalent bond ... or a numerical term like number or matrix. Each data structure may consist of different elements, for example an atom has position (coordinates), name, color etc..

Atom definition

Each atom has its position, name, atomic number, color, temperature factor, crystallographic weight, class, partial electric charge and index.

The position is stored either by X, Y, and Z coordinates, that is by default in the Cartesian coordinate system, or by internal coordinates which relate position each atom to another via interatomic distances, angles and dihedral angles. Cartesian coordinates and distances are in \AA, and angles in degrees. (In the MAIN syntax a synonym for coordinates is COORDINATES, when a relative positional criterion is chosen in a select sentence, then there are other descriptors in use like AROUND, CENTER, DISTANCE, PLANE ...)

An atom name (called ATOM NAME) is a string up to 4 characters long. According to its place in the periodic table the atomic number is found. (The atomic number is not used in MAIN syntax). An atomic name starting with a letter representing periodic table element symbols H, C, N, O, P or S followed by any character represents a hydrogen, carbon, nitrogen, oxygen, phosphorus and sulphur with the atomic numbers 1, 6, 7, 8, 15, and 16 respectively. Since there are many nomenclatures in use, MAIN has some exceptions from the "periodic table" rule. The CA is the C$\alpha$ atom of an amino acid residue, and not a calcium atom. A calcium atom must be declared with CAL. Dummy atoms start with X and have atomic number 0.

The temperature factor (called TEMPERATURE) is the crystallographic B-value of an atom in \AA$^{2}$.

The weight (called WEIGHT) is the crystallographic weight factor.

The charge (called CHARGE) is the partial atomic charge in electrons. It is used by energy and electrostatic potential calculations.

Class (called CLASS) is the synonym for atom type in X-PLOR language. According to the atom class the parameters (bond length, angles and force constants ...) from the force field are found.

The index (called ATOM NUMBER) is the consecutive atomic position in the program. Atoms can be accessed directly in image and select sentences.

Each atom has a color (called COLOR) as an integer value. There are default values.

Residue definition

A single or more consecutive atoms form a residue. Each residue has a name up to 4 characters long (ALA) (called RESIDUE NAME), the residue ID can be up to 5 characters long (A189) (called SEQUENCE).

Residues can be selected also by their index. A residue or segment index is defined similarly as for atoms, namely as a consecutive residue

Chain definition

Each residue belongs to a CHAIN (CHAIN NAME is a single character). A chain is a grop of consecutive residues attached covalently.

CHAINS have NUMBERS too.

Segment definition

One or more chains incompas a SEGMENT (called SEGMENT NAME). The segment is a kind of synonym for a molecule. Usually a group of residues have the same segment name and can thereby easily be recognized and selected. A segment name can be up to 4 characters long string.

Segments have NUMBERS too.

Connectivity tables

Keuword: connectivity, table, bond

There are 4 connectivity lists available: In the table of covalent bonds (called CTABLE) each atom has a record in which all attached neighbors are stored. In the internal coordinates table (called ZTABLE) each atom has a record in which related atoms by distance, angle and dihedral angle are stored. The hydrogen bonds table (called HBOND) is a list of atom pairs that form hydrogen bonds donors and acceptors. The pair table (called PAIR) is a list of atoms forming a pair.

Points

Each point has an index that relates the point to an atom, position in Cartesian coordinates (X, Y, Z), an index that specifies the point kind (called SURFACE, VOLUME, DENSITY, POTENTIAL) and the magnitude of the property (called SURFACE, VOLUME, DENSITY, POTENTIAL). There are 2 kinds of SURFACE points: ACCESSIBLE and REENTRANT. The ACCESSIBLE points are the exposed areas of atoms that can be directly in contact with solvent atoms and the REENTRANT SURFACE points are the ones that can still be accessed by solvent, but are no longer on the surface of atoms. The REENTRANT area is the area where solvent enters the protein surface.

Topology library

The TOPOLOGY library consist of residues. Each topology RESIDUE has a name. Each residue includes list of ATOMS, covalent BONDS and DIHEDRAL and IMPROPER angles. Each atom has its NAME, CHARGE, CLASS and usually also Cartesian and internal coordinates which serve to create and modify the residues. The improper and dihedral angles lists are used to prepare the lists of improper and dihedral energy terms for energy calculations.

Force field parameters

These are lists of BOND, ANGLE, DIHEDRAL and IMPROPER angles, Van der Waals parameters. According to the atom class the force field constants are ordered to each particular term. Besides these there are available also distance (PAIR) and DIHEDRAL angle constraints plus electron DENSITY energy term.

Maps

A map is a 3-dimensional array of grid points, where each of them has a a value. According to the value, they are treated as empty, density or mask points. Grid points with their values inside the density interval are density points, points with values below the density interval are empty points, and ones above are mask points. Each map has size, its starting coordinates and its cell constants. Translational symmetry (P1) of the lattice is always correctly generated. That means that when a whole unit cell is defined in a map, the program can expand the density through the whole space. A complete unit cell map can be Fast Fourier transformed into a reciprocal space map, which can be afterwards modified and back transformed.

Reflections

Reflections are defined within resolution limits and according to reciprocal space zero plane limits set by a READ REFLECTION INIT sentence.

Each reflection has besides its HKL values fields for a real Fobserved value and a complex Fcalculate and Fworkset values. Values between these fields can be added, scaled and multiplied in all possible combinations. This makes possible to calculate various kinds of electron density maps from 2Fo-Fc, Patterson and various correlation maps that can be utilised in the electron density modification procedures (fast solvent flattenig, histogram matching, molecular model packing functions ...).

Reflections are treated as defined when their Fobserved value is greater than zero. Fcalculate is automatically filled after a Fourier transformation of a map has been performed. Fworkset is used to fill a map for its subsequent back Fourier transformation.

Miscellaneous

Symmetry operations and cell constants

Only one list of symmetry operations is present at a time in the program. A symmetry operation consist of a rotational (3x3) matrix and translational vector. Both are in fractional coordinates. The cell constants can be different for each map, however, for operations on atoms, only the last one read is valid.

MAIN variables and constants

During run time MAIN creates and updates a variety of variables like natom, nsegm (number of atoms and segments), $\pi$ etc. The user can create additional ones and change some of the existing ones. Their purpose is to enable noninteractive transfer of data to command sentences and files. The use of MAIN variables is useful in connection with the IF command sentence. There are eight $3 \times 3$


 matrices available to the user. Some of them are used by the program
as well (one for image processing, six for the intermediate results, eight for the stereo pair of matrix 1 and by RMS fit calculations).

Selections and Keys

When a selection is created, each atom is assigned a flag which indicates whether the atom is included in the selection or not. Selections can be also stored permanently as KEYS. In almost any command sentence in which atoms are involved, a selection may be used. When a selection is not explicitly required all atoms are selected. In cases like (DELETE ATOM), however, no default selection is allowed, since it can result in unwanted data loss.

Files

All the files MAIN can read or write are ASCII, so that they can be edited. This makes the connections between MAIN and other programs easier. The exception are the PROTEIN and Lyn Ten Eyck (FFT) formats of a density map. Maps in PROTEIN format can be read and in FFT can be written.

Operations

The basic input and output operations are READ and WRITE, where files of atomic, topology, map, points, symmetry operations ... data can be read or written. Values in existing data arrays can be modified with a SET command sentence and examined with a SHOW command sentence. IMAGE commands define the image parameters and send images to the graphic display. CALCULATE is used to calculate a connectivity lists like BOND, HBONDS and PAIRS by comparing the atomic positions or electrostatic POTENTIAL in points from the atomic charges. MAKE converts data between different forms, e.g. from atoms to maps, from atoms to points, from points to atoms, from maps to atoms, from connectivity table to segments, from map to map ... POINT generates points. RMS and ANALYSE are use to compare various sets of the same property (like COORDINATES, CHARGES ...) Commands like ROTATE, TRANSLATE, SET CHAIN, SET DISTANCE, SET ANGLE, SET DIHEDRAL, OBJECT and MINIMIZE modify the coordinates of atoms or points. MINIMIZE does so by an energy minimization procedure, while the others change the coordinates by rotation and translation. An OBJECT statement is designed to be almost exclusively activated from the menu. IMAGE commands control visualisation of molecules and maps.

Interactive access to the command reference manual through the MAIN syntax will gradually cover complete code. MENU commands (strikes) documentation is already fully linked.

INPUT OUTPUT with open auxiliary programs

Commands related to INPUT and OUTPUT are READ, WRITE, SAVE and SHOW. READ and WRITE relate to the working data forms and formats.

Output from SHOW can be directed into a file, however, which can not be used for MAIN input any more. It is more or less a formated presentation of MAIN status variables and some data arrays.

SAVE is used to save some MAIN data into a form of MAIN command language. The resulting files can be used as MAIN macros.

Map INPUT OUTPUT programs and routines

Several programs have been written for conversion of density maps (PROMAP, PROMAP_R, XPLMAP, XPLMAP_R, MAPFFT, MAPFFT_R, MAPPRO, MAPPRO_R, MAPFFT, MAPFFT_R, DN6MAP). This is more or less an obsolete topic.

The coding of program names is based on a 3 letter code. PRO is abreviation for PROTEIN, XPL for X-PLOR, FFT is used for the format required by program P1SF that does the fast Fourier transform of the electron density, DN6 stands for the FRODO DN6 files. MAP stands for the 'native' MAIN format of the map. There are two MAIN type of maps and with corresponding formats: CHARACTER (character*1) and NUMBERED (real*4). MAIN density maps are ASCII files with maximum 80 characters per record. The header of the map with cell constants is the same for CHARACTER*1 and REAL*4 maps. The "_R" denotes that the file conversion is done from or to the REAL*4 type MAIN map, without "_R" it is assumed that the map is a CHARACTER*1 type.

MAIN can read maps directly in its 3 native formats, called CHARACTER (character*1), NUMBERED (real*4) and ALPHA (hexadecimal character*1 map), PROTEIN, XPLOR and CCP4 (ASCII) format maps. The CHARACTER format is the default one so it doesn't have to be specified. A user should be aware that it is possible to read REAL*4 maps with CHARACTER format. No error occurs and warning is written, but the map is going to be interpreted wrongly. PROTEIN, XPLOR and CCP4 maps are converted to real*4 maps.

MAIN can write maps in FFT, XPLOR, AMoRe and it's three native formats. Each map can be written in any format (CHARACTER, NUMBER, ALPHA, XPLOR and FFT). When there is no format specified, defaults are taken. The default format for a character map is a CHARACTER and for a REAL*4 map is NUMBER. By writing the map identifier (integer number) must be specified. The current MAIN version can store up to 10 maps at once.

There are differences in file structure between UNIX and VMS as regards electron density map transferability. CHARACTER and NUMBER forms are not available on every UNIX machine. CONVEX FORTRAN up to now only supports a FIXED record length ASCII files; the SGI, HP and ESV do not allow usage of CHARACTER and NUMBER formats. The ALPHA format was designed to transfer files between VMS and UNIX for their visual inspection. The XPLOR may be used as well, though it produces larger files.

PROMAP, PROMAP_R, MAPOUT, MAPOUT_R and DN6MAP only work under VMS. The PROMAPs can read only PROTEIN release 2 electron density maps. MAPOUTs can write only PROTEIN release 2 map, that are not readable by PROTEIN any more, however MAPAGE is capable to generate DN6 files for FRODO users.

The syntax looks as follows:

Character format reading:


MAIN> read file x.map map
MAIN> read file x.map map char

Real*4 format reading:


MAIN> read file x.rmap map numb

PROTEIN format reading:


MAIN> read file x.diz map protein

Write file statements:


MAIN> write file x.map map 1
MAIN> write file x.map map 1 number
MAIN> write file x.map map 1 character
MAIN> write file x.map map 1 fft

Lwplot

Lwplot reads a MAIN plot file (form used for ORTEP, the same as FRODO). The program was written and modified by a series of people. The advantages of the lwplot distributed with MAIN are that a specified window size also cuts the image at its borders and that each time when it is run, a backup of each user answer is written to the FTN79 that can be later on moved/renamed and edited for further tries.

Lately lwplot works also with colors.

Technical data

The source code containes more than 150 000 lines of source code, about 100 000 lines are graphic independent part, and 40 000 lines are the graphics part. The computer language used is FORTRAN, except the X-windows and OpenGL interfaces, which are written in C. MAIN is written in almost standard FORTRAN77. The two exceptions are listed below:

The non-graphic part is easily transferable to any computer of appropriate capabilities, although MAIN basic environment is now UNIX. VMS is actually not supported anymore, NT-windows have not been tackled so far. MAIN with a graphic interface runs on: