EMBOSS: contacts


Program contacts

Function

Reads coordinate files and writes contact files

Description

contacts parses an embl-like clean coordinate files generated by the coorde application (not currently in emboss, email Jon Ison jison@hgmp.mrc.ac.uk) or the domainer application and writes, for each file in a given directory, files of residue-residue contact data in embl-like format. Each of these files contains residue contact data for each chain of every model in the coordinate file (or the single scop domain in the case where domainer output is read).

Contact between two residues is defined as when the van der Waals surface of any atom of the first residue comes within the threshold contact distance of the van der Waals surface of any atom of the second residue. The threshold contact distance is a user-defined distance with a default value of 1 Angstrom.

The following van der Waals radii are used:

C:1.8 Angstrom 
O:1.4 Angstrom 
N:1.7 Angstrom 
S:2.0 Angstrom
H:1.0 Angstrom (default for other or unknown atom types)

Usage

Here is a sample session with contacts:

% contacts
Reads coordinate files and writes contact files
Location of coordinate files for input (embl-like format) [./]: 
Extension of coordinate files (embl-like format) [.pxyz]: 
Location of contact files for output [./]: 
Extension of contact files [.con]: 
Threshold contact distance [1.0]: 
Name of data file with van der Waals radii [Evdw.dat]: 
Name of log file for the build [contacts.log]: 

Command line arguments

   Mandatory qualifiers:
  [-cpdb]              string     Location of coordinate files for input
                                  (embl-like format)
  [-cpdbextn]          string     Extension of coordinate files (embl-like
                                  format)
  [-con]               string     Location of contact files for output
  [-conextn]           string     Extension of contact files
  [-thresh]            float      Threshold contact distance
  [-ignore]            float      If any two atoms from two different residues
                                  are at least this distance apart then no
                                  futher inter-atomic contacts will be checked
                                  for for that residue pair . This speeds the
                                  calculation up considerably.
  [-vdwf]              string     Name of data file with van der Waals radii
  [-conerrf]           outfile    Name of log file for the build

   Optional qualifiers: (none)
   Advanced qualifiers: (none)
   General qualifiers:
  -help                bool       report command line options. More
                                  information on associated and general
                                  qualifiers can be found with -help -verbose


Mandatory qualifiers Allowed values Default
[-cpdb]
(Parameter 1)
Location of coordinate files for input (embl-like format) Any string is accepted ./
[-cpdbextn]
(Parameter 2)
Extension of coordinate files (embl-like format) Any string is accepted .pxyz
[-con]
(Parameter 3)
Location of contact files for output Any string is accepted ./
[-conextn]
(Parameter 4)
Extension of contact files Any string is accepted .con
[-thresh]
(Parameter 5)
Threshold contact distance Any integer value 1.0
[-ignore]
(Parameter 6)
If any two atoms from two different residues are at least this distance apart then no futher inter-atomic contacts will be checked for for that residue pair . This speeds the calculation up considerably. Any integer value 20.0
[-vdwf]
(Parameter 7)
Name of data file with van der Waals radii Any string is accepted Evdw.dat
[-conerrf]
(Parameter 8)
Name of log file for the build Output file contacts.log
Optional qualifiers Allowed values Default
(none)
Advanced qualifiers Allowed values Default
(none)

Input file format

It reads in an embl-like clean coordinate files generated by the coorde application or the domainer application.

For example:


ID   D1HBBA_
XX
DE   Co-ordinates for SCOP domain D1HBBA_
XX
OS   See Escop.dat for domain classification
XX
EX   METHOD xray; RESO 1.90; NMOD 1; NCHA 1;
XX
CN   [1]
XX
IN   ID A; NR 141; NH 0; NW 0;
XX
SQ   SEQUENCE   141 AA;  15127 MW;  5EC7DB1E CRC32;
     VLSPADKTNV KAAWGKVGAH AGEYGAEALE RMFLSFPTTK TYFPHFDLSH GSAQVKGHGK
     KVADALTNAV AHVDDMPNAL SALSDLHAHK LRVDPVNFKL LSHCLLVTLA AHLPAEFTPA
     VHASLDKFLA SVSTVLTSKY R
XX
CO   1    1    P    1     1     V    VAL    N      7.155   17.725 4.424     1.00
    37.82
CO   1    1    P    1     1     V    VAL    CA     7.854   18.800 3.718     1.00
    35.10
CO   1    1    P    1     1     V    VAL    C      9.366   18.565 3.754     1.00
    31.92
CO   1    1    P    1     1     V    VAL    O      9.861   17.961 4.721     1.00
    35.01
CO   1    1    P    1     1     V    VAL    CB     7.529   20.168 4.360     1.00
    47.63
CO   1    1    P    1     1     V    VAL    CG1    7.806   21.300 3.369     1.00
    62.84
CO   1    1    P    1     1     V    VAL    CG2    6.136   20.244 4.936     1.00
    54.85
CO   1    1    P    2     2     L    LEU    N     10.032   19.062 2.731     1.00
    27.38
CO   1    1    P    2     2     L    LEU    CA    11.496   18.967 2.657     1.00
    23.24
CO   1    1    P    2     2     L    LEU    C     12.077   20.110 3.496     1.00
    22.99
CO   1    1    P    2     2     L    LEU    O     11.672   21.259 3.289     1.00
    25.22

Output file format

The embl-like format used for the contact files uses the following records:

  1. ID - either the 4-character PDB identifier code (where clean protein coordinate files are used as input) or the 7-character domain identifier code taken from scop (where domain coordinate files were used as input; see documentation for the EMBOSS application scope for further info.)
  2. DE - bibliographic information. The text "Residue-residue contact data" is always given.
  3. EX - experimental information. The value of the threshold contact distance is given as a floating point number after 'THRESH'. The number of models and number of polypeptide chains are given after 'NMOD' and 'NCHA' respectively. domain coordinate files a 1 is always given. Following the EX record, the file will have a section containing a CN, IN and SM records (see below) for each chain. The sections for each chain of a model are given after the MO record.
  4. MO - model number. The number given in brackets after this record indicates the start of a section of model-specific data.
  5. CN - chain number. The number given in brackets after this record indicates the start of a section of chain-specific data.
  6. IN - chain specific data. The character given after ID is the PDB chain identifier taken from the input file, (a '.` given in cases where a chain identifier was not specified in the original pdb file or, for domain coordinate files, the domain is comprised of more than one domain). The number of amino acid residues comprising the chain (or the chains from which a domain is comprised) is given after NR. The number of residue-residue contacts is given after NSMCON.
  7. SM - Line of residue contact data. Pairs of amino acid identifiers and residue numbers are delimited by a ';'. Residue numbers are taken from the clean coordinate file and give a correct index into the sequence (i.e. they are not necessarily the same as the original pdb file).
  8. XX - used for spacing.
  9. // - given on the last line of the file only.

Note - SM records are used for contacts between either either side-chain or main-chain atoms as defined above. In a future implementation, SS will be used for side-chain only contacts, MM will be used for main-chain only contacts, and there will probably be several other forms of contact too.

Example contacts output file:


ID   D1HBBB_
XX
DE   Residue-residue side-chain contact data
XX
EX   THRESH 10.0; NMOD 1; NCHA 1;
XX
MO   [1]
XX
CN   [1]
XX
IN   ID B; NR 146; NSMCON 2514;
XX
SM   VAL 1 ; HIS 2
SM   VAL 1 ; LEU 3
SM   VAL 1 ; THR 4
SM   VAL 1 ; PRO 5
SM   VAL 1 ; GLU 6
SM   VAL 1 ; GLU 7
SM   VAL 1 ; LYS 8
SM   VAL 1 ; VAL 11
SM   VAL 1 ; PHE 71
//

contacts generates a log file an excerpt of which is shown below. If there is a problem in processing a coordinate file, three lines containing the record '//', the scop domain or pdb identifier code and an error message respectively are written. The text 'WARN file open error filename', 'ERROR file read error filename' or 'ERROR file write error filename ' will be reported when an error was encountered during a file open, read or write respectively. Various other error messages may also be given (in case of difficulty email Jon Ison, jison@hgmp.mrc.ac.uk).

Example log file


//
DS002__
WARN  Could not open for reading cpdb file s002.pxyz
//
DS003__
WARN  Could not open for reading cpdb file s003.pxyz

Data files

contacts reads in data on van der Waals radii for atoms in proteins from the data file Evdw.dat (by default).

EMBOSS data files are distributed with the application and stored in the standard EMBOSS data directory, which is defined by the EMBOSS environment variable EMBOSS_DATA.

To see the available EMBOSS data files, run:

% embossdata -showall

To fetch one of the data files (for example 'Exxx.dat') into your current directory for you to inspect or modify, run:


% embossdata -fetch -file Exxx.dat

Users can provide their own data files in their own directories. Project specific files can be put in the current directory, or for tidier directory listings in a subdirectory called ".embossdata". Files for all EMBOSS runs can be put in the user's home directory, or again in a subdirectory called ".embossdata".

The directories are searched in the following order:

Notes

None.

References

None.

Warnings

None.

Diagnostic Error Messages

None.

Exit status

It always exits with status 0.

Known bugs

None.

See also

Program nameDescription
dichetParse dictionary of heterogen groups
psiblastsRuns PSI-BLAST given scopalign alignments
scopalignGenerate alignments for SCOP families
siggenGenerates a sparse protein signature
sigscanScans a sparse protein signature against swissprot

Author(s)

This application was written by Jon Ison (jison@hgmp.mrc.ac.uk)

History

Written (June 2001) - Jon Ison

Target users

This program is intended to be used by everyone and everything, from naive users to embedded scripts.

Comments