EMBOSS: antigenic


Program antigenic

Function

Finds antigenic sites in proteins

Description

Antigenic predicts potentially antigenic regions of a protein sequence, using the method of Kolaskar and Tongaonkar.

Analysis of data from experimentally determined antigenic sites on proteins has revealed that the hydrophobic residues Cys, Leu and Val, if they occur on the surface of a protein, are more likely to be a part of antigenic sites. A semi-empirical method which makes use of physicochemical properties of amino acid residues and their frequencies of occurrence in experimentally known segmental epitopes was developed by Kolaskar and Tongaonkar to predict antigenic determinants on proteins. Application of this method to a large number of proteins has shown that their method can predict antigenic determinants with about 75% accuracy which is better than most of the known methods. This method is based on a single parameter and thus very simple to use.

Usage

Here is a sample session with antigenic.

% antigenic
Finds antigenic sites in proteins
Input sequence: sw:act1_fugru
Minimum length [6]: 
Output file [act1_fugru.antigenic]: 

Command line arguments

   Mandatory qualifiers:
  [-sequence]          seqall     Sequence database USA
   -minlen             integer    Minimum length
  [-outfile]           report     (no help text) report value

   Optional qualifiers: (none)
   Advanced qualifiers: (none)
   General qualifiers:
  -help                bool       report command line options. More
                                  information on associated and general
                                  qualifiers can be found with -help -verbose


Mandatory qualifiers Allowed values Default
[-sequence]
(Parameter 1)
Sequence database USA Readable sequence(s) Required
-minlen Minimum length Integer from 1 to 50 6
[-outfile]
(Parameter 2)
(no help text) report value Report file  
Optional qualifiers Allowed values Default
(none)
Advanced qualifiers Allowed values Default
(none)

Input file format

The input sequence can be one or more protein sequences.

Output file format

Here is a sample output:


ANTIGENIC of ACT1_FUGRU  from: 1  to: 375

Length 375 residues, score calc from 4 to 372
Reporting all peptides over 6 residues

Found 18 hits scoring over 1.00 (true average 1.02)
Maximum length 24 at residues 160->183

 Sequence:  THTVPIYEGYALPHAILRLDLAGR
            |                      |
          160                      183

Entries in score order, max score at "*"


[1] Score 1.207 length 9 at residues 214->222
                *    
 Sequence:  EKLCYVALD
            |       |
          214       222

[2] Score 1.187 length 15 at residues 131->145
                  *        
 Sequence:  AMYVAIQAVLSLYAS
            |             |
          131             145

[3] Score 1.166 length 8 at residues 5->12
               *    
 Sequence:  IAALVVDN
            |      |
            5      12


By using the '-featout' qualifier, a GFF file of the predicted regions can be produced. For example:

% antigenic -featout eg.gff
Finds antigenic sites in proteins
Input sequence(s): sw:act1_fugru
Minimum length [6]: 
Output file [act1_fugru.antigenic]: 

% more eg.gff
##gff-version 0.0
##date 2001-02-13
##sequence-region ACT1_FUGRU 0 0
ACT1_FUGRU      antigenic       misc_feature    5       12      1.166   +	.       note "max score at 7"
ACT1_FUGRU      antigenic       misc_feature    27      38      1.164   +	.       note "max score at 31"
ACT1_FUGRU      antigenic       misc_feature    40      46      1.066   +	.       note "max score at 42"
ACT1_FUGRU      antigenic       misc_feature    51      57      1.034   +	.       note "max score at 51"
ACT1_FUGRU      antigenic       misc_feature    62      76      1.102   +	.       note "max score at 67"
ACT1_FUGRU      antigenic       misc_feature    93      108     1.116   +	.       note "max score at 102 ,103"
ACT1_FUGRU      antigenic       misc_feature    131     145     1.187   +	.       note "max score at 136"
ACT1_FUGRU      antigenic       misc_feature    160     183     1.136   +	.       note "max score at 172"
ACT1_FUGRU      antigenic       misc_feature    186     192     1.068   +	.       note "max score at 190"
ACT1_FUGRU      antigenic       misc_feature    214     222     1.207   +	.       note "max score at 217"
ACT1_FUGRU      antigenic       misc_feature    232     250     1.086   +	.       note "max score at 244"
ACT1_FUGRU      antigenic       misc_feature    256     266     1.110   +	.       note "max score at 263"
ACT1_FUGRU      antigenic       misc_feature    269     275     1.045   +	.       note "max score at 268"
ACT1_FUGRU      antigenic       misc_feature    295     301     1.113   +	.       note "max score at 295"
ACT1_FUGRU      antigenic       misc_feature    317     323     1.074   +	.       note "max score at 319"
ACT1_FUGRU      antigenic       misc_feature    327     332     1.083   +	.       note "max score at 329"
ACT1_FUGRU      antigenic       misc_feature    336     352     1.107   +	.       note "max score at 346"
ACT1_FUGRU      antigenic       misc_feature    367     372     1.135   +	.       note "max score at 371"

Data files

Antigenic uses a data file called Eantigenic.dat.

EMBOSS data files are distributed with the application and stored in the standard EMBOSS data directory, which is defined by EMBOSS environment variable EMBOSS_DATA.

Users can provide their own data files in their own directories. Project specific files can be put in the current directory, or for tidier directory listings in a subdirectory called ".embossdata". Files for all EMBOSS runs can be put in the user's home directory, or again in a subdirectory called ".embossdata".

The directories are searched in the following order:

Here is the default Eantigenic.dat file:


#                                               Antigenic  Surface  Antigenic
# Amino     -- Occurrence of amino acids in --   frequency frequency propensity
# Acid       Epitopes      Surface     Protein   f(Ag)    f(s)      A(p)
  A             135          328         524     0.065    0.061     1.064
  C              53           97         186     0.026    0.018     1.412
  D             118          352         414     0.057    0.066     0.866
  E             132          401         499     0.064    0.075     0.851
  F              76          180         365     0.037    0.034     1.091
  G             116          343         487     0.056    0.064     0.874
  H              59          138         191     0.029    0.026     1.105
  I              86          193         437     0.042    0.036     1.152
  K             158          439         523     0.076    0.082     0.930
  L             149          308         684     0.072    0.058     1.250
  M              23           72         152     0.011    0.013     0.826
  N              94          313         407     0.045    0.058     0.776
  P             135          328         411     0.065    0.061     1.064
  Q              99          252         332     0.048    0.047     1.015
  R             106          314         394     0.051    0.058     0.873
  S             168          429         553     0.081    0.080     1.012
  T             141          401         522     0.068    0.075     0.909
  V             128          239         515     0.062    0.045     1.383
  W              19           55         103     0.009    0.010     0.893
  Y              71          158         245     0.034    0.029     1.161
Total          2066         5340        7944

Notes

References

  1. Kolaskar,AS and Tongaonkar,PC (1990). A semi-empirical method for prediction of antigenic determinants on protein antigens. FEBS Letters 276: 172-174.
  2. Parker,JMR, Guo,D and Hodges,RS (1986). Biochemistry 25: 5425-5432.

Warnings

The program will warn you if the sequence is not a protein or has ambiguity codes.

Diagnostic Error Messages

Exit status

It exits with status 0, unless a region is badly constructed.

Known bugs

None.

See also

Program nameDescription
digestProtein proteolytic enzyme or reagent cleavage digest
fuzzproProtein pattern search
fuzztranProtein pattern search after translation
helixturnhelixReport nucleic acid binding motifs
oddcompFinds protein sequence regions with a biased composition
patmatdbSearch a protein sequence with a motif
patmatmotifsSearch a PROSITE motif database with a protein sequence
pepcoilPredicts coiled coil regions
pregRegular expression search of a protein sequence
pscanScans proteins using PRINTS
sigcleaveReports protein signal cleavage sites

Author(s)

This application was written by Alan Bleasby (ableasby@hgmp.mrc.ac.uk)

Original program "ANTIGENIC" by Peter Rice (EGCG 1991)

History

Completed 9th March 1999

Target users

This program is intended to be used by everyone and everything, from naive users to embedded scripts.

Comments