EMBOSS: pepstats


Program pepstats

Function

Protein statistics

Description

pepstats outputs a report of simple protein sequence information including:

DayhoffStat is the amino acid's Dayhoff statistic divided by the molar percent. The Dayhoff statistic is the amino acid's relative occurence per 1000 aa normalised to 100 by rls@ebi.ac.uk (original work from 1993)

Usage

Here is a sample session with pepstats.

% pepstats
Protein statistics
Input sequence: sw:laci_ecoli
Output file [laci_ecoli.pepstats]:

Command line arguments

   Mandatory qualifiers:
  [-sequencea]         sequence   Sequence USA
   -outfile            outfile    Output file name

   Optional qualifiers: (none)
   Advanced qualifiers:
   -[no]termini        bool       Include charge at N and C terminus
   -aadata             string     Molecular weight data for amino acids

   General qualifiers:
  -help                bool       report command line options. More
                                  information on associated and general
                                  qualifiers can be found with -help -verbose


Mandatory qualifiers Allowed values Default
[-sequencea]
(Parameter 1)
Sequence USA Readable sequence Required
-outfile Output file name Output file <sequence>.pepstats
Optional qualifiers Allowed values Default
(none)
Advanced qualifiers Allowed values Default
-[no]termini Include charge at N and C terminus Yes/No Yes
-aadata Molecular weight data for amino acids Any string is accepted Eamino.dat

Input file format

Normal protein sequence USA.

Output file format

Here is the output from the example run:


PEPSTATS of LACI_ECOLI from 1 to 360

Molecular weight = 38563.98             Residues = 360   
Average Residue Weight  = 107.122       Charge   = 1.5   
Isoelectric Point = 6.8820

Residue         Number          Mole%           DayhoffStat
A = Ala         44              12.222          1.421  
B = Asx         0               0.000           0.000  
C = Cys         3               0.833           0.287  
D = Asp         17              4.722           0.859  
E = Glu         15              4.167           0.694  
F = Phe         4               1.111           0.309  
G = Gly         22              6.111           0.728  
H = His         7               1.944           0.972  
I = Ile         18              5.000           1.111  
K = Lys         11              3.056           0.463  
L = Leu         40              11.111          1.502  
M = Met         10              2.778           1.634  
N = Asn         12              3.333           0.775  
P = Pro         14              3.889           0.748  
Q = Gln         28              7.778           1.994  
R = Arg         19              5.278           1.077  
S = Ser         33              9.167           1.310  
T = Thr         19              5.278           0.865  
V = Val         34              9.444           1.431  
W = Trp         2               0.556           0.427  
X = Xxx         0               0.000           0.000  
Y = Tyr         8               2.222           0.654  
Z = Glx         0               0.000           0.000  

Property        Residues                Number          Mole%
Tiny            (A+C+G+S+T)             121             33.611
Small           (A+B+C+D+G+N+P+S+T+V)   198             55.000
Aliphatic       (I+L+V)                 92              25.556
Aromatic        (F+H+W+Y)               21               5.833
Non-polar       (A+C+F+G+I+L+M+P+V+W+Y) 199             55.278
Polar           (D+E+H+K+N+Q+R+S+T+Z)   161             44.722
Charged         (B+D+E+H+K+R+Z)         69              19.167
Basic           (H+K+R)                 37              10.278
Acidic          (B+D+E+Z)               32               8.889

Data files

The Dayhoff statistic is read from the EMBOSS data file 'Edayhoff.freq'. You can inspect and modify this file by copying it into your current directory with the command: 'embossdata -fetch'.

EMBOSS data files are distributed with the application and stored in the standard EMBOSS data directory, which is defined by EMBOSS environment variable EMBOSS_DATA.

Users can provide their own data files in their own directories. Project specific files can be put in the current directory, or for tidier directory listings in a subdirectory called ".embossdata". Files for all EMBOSS runs can be put in the user's home directory, or again in a subdirectory called ".embossdata".

The directories are searched in the following order:

Notes

None.

References

None.

Warnings

None.

Diagnostic Error Messages

None.

Exit status

It always exits with a status of 0.

Known bugs

None.

See also

Program nameDescription
backtranseqBack translate a protein sequence
chargeProtein charge plot
checktransReports STOP codons and ORF statistics of a protein sequence
compseqCounts the composition of dimer/trimer/etc words in a sequence
emowseProtein identification by mass spectrometry
freakResidue/base frequency table or plot
iepCalculates the isoelectric point of a protein
mwfilterFilter noisy molwts from mass spec output
octanolDisplays protein hydropathy
pepinfoPlots simple amino acid properties in parallel
pepwindowDisplays protein hydropathy
pepwindowallDisplays protein hydropathy of a set of sequences

Author(s)

This application was written by Alan Bleasby (ableasby@hgmp.mrc.ac.uk)

History

Written (1999) - Alan Bleasby

Target users

This program is intended to be used by everyone and everything, from naive users to embedded scripts.

Comments