EMBOSS: pepstats


Program pepstats

Function

Protein statistics

Description

This outputs a report of simple protein sequence information including:

DayhoffStat is the amino acid's Dayhoff statistic divided by the molar percent. The Dayhoff statistic is the amino acid's relative occurence per 1000 aa normalised to 100 by rls@ebi.ac.uk (original work from 1993)

Usage

Here is a sample session with pepstats.

% pepstats
Input sequence: sw:laci_ecoli
Output file [laci_ecoli.pepstats]: 

Command line arguments

   Mandatory qualifiers:
  [-sequencea]         sequence   Sequence USA
   -outfile            outfile    Output file name

   Optional qualifiers: (none)
   Advanced qualifiers:
   -[no]termini        bool       Include charge at N and C terminus


Mandatory qualifiers Allowed values Default
[-sequencea]
(Parameter 1)
Sequence USA Readable sequence Required
-outfile Output file name Output file <sequence>.pepstats
Optional qualifiers Allowed values Default
(none)
Advanced qualifiers Allowed values Default
-[no]termini Include charge at N and C terminus Yes/No Yes

Input file format

Output file format

Here is the output from the example run:

PEPSTATS of LACI_ECOLI from 1 to 360

Molecular weight = 38563.98             Residues = 360   
Average Residue Weight  = 107.122       Charge   = 1.5   
Isoelectric Point = 6.8820

Residue         Number          Mole%           DayhoffStat
A = Ala         44              12.222          1.421  
B = Asx         0               0.000           0.000  
C = Cys         3               0.833           0.287  
D = Asp         17              4.722           0.859  
E = Glu         15              4.167           0.694  
F = Phe         4               1.111           0.309  
G = Gly         22              6.111           0.728  
H = His         7               1.944           0.972  
I = Ile         18              5.000           1.111  
K = Lys         11              3.056           0.463  
L = Leu         40              11.111          1.502  
M = Met         10              2.778           1.634  
N = Asn         12              3.333           0.775  
P = Pro         14              3.889           0.748  
Q = Gln         28              7.778           1.994  
R = Arg         19              5.278           1.077  
S = Ser         33              9.167           1.310  
T = Thr         19              5.278           0.865  
V = Val         34              9.444           1.431  
W = Trp         2               0.556           0.427  
X = Xxx         0               0.000           0.000  
Y = Tyr         8               2.222           0.654  
Z = Glx         0               0.000           0.000  

Property        Residues                Number          Mole%
Tiny            (A+C+G+S+T)             121             33.611
Small           (A+B+C+D+G+N+P+S+T+V)   198             55.000
Aliphatic       (I+L+V)                 92              25.556
Aromatic        (F+H+W+Y)               21               5.833
Non-polar       (A+C+F+G+I+L+M+P+V+W+Y) 199             55.278
Polar           (D+E+H+K+N+Q+R+S+T+Z)   161             44.722
Charged         (B+D+E+H+K+R+Z)         69              19.167
Basic           (H+K+R)                 37              10.278
Acidic          (B+D+E+Z)               32               8.889

Data files

The Dayhoff statistic is read from the EMBOSS data file 'Edayhoff.freq'. You can inspect and modify this file by copying it into your current directory with the command: 'embossdata -fetch'.

EMBOSS data files are distributed with the application and stored in the standard EMBOSS data directory, which is defined by EMBOSS environment variable EMBOSS_DATA.

Users can provide their own data files in their own directories. Project specific files can be put in the current directory, or for tidier directory listings in a subdirectory called ".embossdata". Files for all EMBOSS runs can be put in the user's home directory, or again in a subdirectory called ".embossdata".

The directories are searched in the following order:

Notes

References

Warnings

Diagnostic Error Messages

Exit status

Known bugs

See also

Program nameDescription
checktransReports STOP codons and ORF statistics of a protein sequence
digestProtein proteolytic enzyme or reagent cleavage digest
iepCalculates the isoelectric point of a protein
octanolDisplays protein hydropathy
pepinfoPlots simple amino acid properties in parallel
pepnetDisplays proteins as a helical net
pepwheelShows protein sequences as helices
pepwindowDisplays protein hydropathy
pepwindowallDisplays protein hydropathy of a set of sequences

Author(s)

This application was written by Alan Bleasby (ableasby@hgmp.mrc.ac.uk)

History

Target users

This program is intended to be used by everyone and everything, from naive users to embedded scripts.

Comments