EMBOSS: garnier


Program garnier

Function

Predicts protein secondary structure

Description

This is an implementation of the original Garnier Osguthorpe Robson algorithm (GOR I) for predicting protein secondary structure.

Usage

Here is a sample session with garnier.

% garnier
Input sequence: sw:amic_pseae
Output file [amic_pseae.garnier]: 

Command line arguments

   Mandatory qualifiers:
  [-sequencea]         seqall     Sequence database USA
  [-outfile]           outfile    Output file name

   Optional qualifiers: (none)
   Advanced qualifiers:
   -idc                integer    idc param


Mandatory qualifiers Allowed values Default
[-sequencea]
(Parameter 1)
Sequence database USA Readable sequence(s) Required
[-outfile]
(Parameter 2)
Output file name Output file <sequence>.garnier
Optional qualifiers Allowed values Default
(none)
Advanced qualifiers Allowed values Default
-idc idc param Integer from 0 to 6 0

Input file format

Any protein sequence.

Output file format

Here is the output from the example run.

 GARNIER plot of AMIC_PSEAE, 384 aa; DCH = 0, DCS = 0

 Please cite:
 Garnier, Osguthorpe and Robson (1978) J. Mol. Biol. 120:97-120

           .   10    .   20    .   30    .   40    .   50    .   60
       GSHQERPLIGLLFSETGVTADIERSHAYGALLAVEQLNREGGVGGRPIETLSQDPGGDPD
 helix                   HHHHHHHHHHHHHHHHHHH                       
 sheet      EE EEEEE                                 EEEEE         
 turns        T                              TTTT         TT TT   T
 coil  CCCCC        CCCCC                   C    CCCC       C  CCC 

           .   70    .   80    .   90    .  100    .  110    .  120
       RYRLCAEDFIRNRGVRFLVGCYMSHTRKAVMPVVERADALLCYPTPYEGFEYSPNIVYGG
 helix     HHHHHH            HHHH H     HHHHHH                     
 sheet EEEE           EEEE          EEEE      EEEE    E       EE   
 turns           TTTTT    TTT    T T                 T TTT  TT  T  
 coil                                             CCC     CC     CC

           .  130    .  140    .  150    .  160    .  170    .  180
       PAPNQNSAPLAAYLIRHYGERVVFIGSDYIYPRESNHVMRHLYRQHGGTVLEEIYIPLYP
 helix          HHH                        HHHH                    
 sheet         E   EEEE    EEEEE               EEE       EEEEEEE   
 turns    TT           TT T     TTTT   TTT        TTT             T
 coil  CCC  CCC          C          CCC   C          CCCC       CC 

           .  190    .  200    .  210    .  220    .  230    .  240
       SDDDLQRAVERIYQARADVVFSTVVGTGTAELYRAIARRYGDGRRPPIASLTTSEAEVAK
 helix    HHHHHHHHHHHHH             HHHHHHH                HHHHHHHH
 sheet                 EEEEEEEE            EE         EEE          
 turns TTT                                   TTTTTT                
 coil                          CCCCC               CCC   CC        

           .  250    .  260    .  270    .  280    .  290    .  300
       MESDVAEGQVVVAPYFSSIDTPASRAFVQACHGFFPENATITAWAEAAYWQTLLLGRAAQ
 helix HHHHHHHHH               HHHH           HHHHHHHHHHHHH    HHHH
 sheet          EEEEE   E          EE                      E       
 turns               TTT T   T       TTT   TT                      
 coil                     CCC C         CCC  C              CCC    

           .  310    .  320    .  330    .  340    .  350    .  360
       AAGNWRVEDVQRHLYDIDIDAPQGPVRVERQNNHSRLSSRIAEIDARGVFQVRWQSPEPI
 helix       HHHHHHH                             HHH               
 sheet              E  EEEE     EEEEE         EEE      EEEE        
 turns               TT     T        TT   T         TTT    TT    TT
 coil  CCCCCC              C CCC       CCC CCC               CCCC  

           .  370    .  380
       RPDPYVVVHNLDDWSASMGGGPLP
 helix                         
 sheet    EEEEEEE     E        
 turns            TTT  TTT     
 coil  CCC       C   C    CCCCC

 Residue totals: H:  0   E:  0   T:  0   C:  1
        percent: H:  0.0 E:  0.0 T:  0.0 C:  0.3
--------------------------------------------------------------------

Data files

Notes

The Garnier method is not regarded as the most accurate prediction, but is simple to calculate on most workstations.

The Web servers for PHD, DSC, and others are generally preferred.

The 3D structure for the example sequence is known, although the 2D structure elements were not in the SwissProt feature table for release 38 when the test data was extracted.

DSSP shows:

 From     To   Structure
    9     13   E beta sheet
   21     39   H alpha helix
   50     54   E beta sheet
   60     72   H alpha helix
   78     81   E beta sheet
   85     97   H alpha helix
  101    104   E beta sheet
  117    119   E beta sheet
  128    136   H alpha helix
  142    148   E beta sheet
  151    166   H alpha helix
  170    177   E beta sheet
  183    196   H alpha helix
  200    204   E beta sheet
  208    221   H alpha helix
  229    231   E beta sheet
  236    239   H alpha helix
  244    247   H alpha helix
  251    254   E beta sheet
  263    273   H alpha helix
  284    303   H alpha helix
  308    315   H alpha helix
  320    322   E beta sheet
  325    329   E beta sheet
  336    337   E beta sheet
  341    345   E beta sheet
  351    356   E beta sheet

The meaning and use of the parameter 'idc' is currently being investigated. The original author, Bill Pearson writes:

"In their paper, GOR mention that if you know something about the secondary structure content of the protein you are analyzing, you can do better in prediction. "idc" is an index into a set of arrays, dharr[] and dsarr[], which provide "decision constants" (dch, dcs), which are offsets that are applied to the weights for the helix and sheet (extend) terms. So, idc=0 says don't use the decision constant offsets, and idc=1 to 6 indicates that various combinations of dch,dcs offsets should be used. I don't remember what they are, but I must have gotten the values from their paper."

References

Garnier J, Osguthorpe DJ, Robson B Analysis of the accuracy and implications of simple methods for predicting the secondary structure of globular proteins. J Mol Biol 1978 Mar 25;120(1):97-120

Warnings

The accuracy of any secondary structure prediction program is not much better than 70% to 80% at best. This is an early algorithm and will probably not predict with much better than about 65% accuracy.

You are advised to use several of the latest Web-based prediction sites and combine them to make a consensus prediction.

Diagnostic Error Messages

Exit status

Known bugs

See also

Program nameDescription
antigenicFinds antigenic sites in proteins
diffseqFind differences (SNPs) between nearly identical sequences
dotmatcherDisplays a thresholded dotplot of two sequences
dotpathDisplays a non-overlapping wordmatch dotplot of two sequences
dottupDisplays a wordmatch dotplot of two sequences
helixturnhelixReport nucleic acid binding motifs
oddcompFinds protein sequence regions with a biased composition
pepcoilPredicts coiled coil regions
pepnetDisplays proteins as a helical net
pepwheelShows protein sequences as helices
polydotDisplays all-against-all dotplots of a set of sequences
pscanScans proteins using PRINTS
showseqDisplay a sequence with features, translation etc
sigcleaveReports protein signal cleavage sites
tmapDisplays membrane spanning regions

Author(s)

This program ('GARNIER') was originally written by William Pearson (wrp@virginia.edu) and released as part of his FASTA package.

This application was modified for inclusion in EMBOSS by Rodrigo Lopez (rls@ebi.ac.uk) European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK.

History

Target users

This program is intended to be used by everyone and everything, from naive users to embedded scripts.

Comments