EMBOSS: fuzzpro


Program fuzzpro

Function

Protein pattern search

Description

fuzzpro uses PROSITE style patterns to search protein sequences.

Patterns are specifications of a (typically short) length of sequence to be found. They can specify a search for an exact sequence or they can allow various ambiguities, matches to variable lengths of sequence and repeated subsections of the sequence.

fuzzpro intelligently selects the optimum searching algorithm to use, depending on the complexity of the search pattern specified.

Usage

Here is a sample session with fuzzpro.

% fuzzpro
Input sequence: sw:*
Search pattern: [FY]-[LIV]-G-[DE]-E-A-Q-x-[RKQ](2)-G
Number of mismatches [0]: 
Output file [5h1d_fugru.fuzzpro]:

Command line arguments

   Mandatory qualifiers:
  [-sequence]          seqall     Sequence database USA
   -pattern            string     Search pattern
   -mismatch           integer    Number of mismatches
  [-outf]              outfile    Output file name

   Optional qualifiers: (none)
   Advanced qualifiers:
   -mmshow             bool       Show mismatches
   -accshow            bool       Show accession numbers
   -descshow           bool       Show descriptions


Mandatory qualifiers Allowed values Default
[-sequence]
(Parameter 1)
Sequence database USA Readable sequence(s) Required
-pattern Search pattern Any string is accepted An empty string is accepted
-mismatch Number of mismatches Integer 0 or more 0
[-outf]
(Parameter 2)
Output file name Output file <sequence>.fuzzpro
Optional qualifiers Allowed values Default
(none)
Advanced qualifiers Allowed values Default
-mmshow Show mismatches Yes/No No
-accshow Show accession numbers Yes/No No
-descshow Show descriptions Yes/No No

Input file format

Patterns for fuzzpro are based on the format of pattern used in the PROSITE database, with the difference that the terminating dot '.' and the hyphens, '-', between the characters are optional.

The PROSITE pattern definition from the PROSITE documentation follows.

For example, in SWISSPROT entry 100K_RAT you can look for the pattern:

[DE](2)HS{P}X(2)PX(2,4)C

This means: Two Asps or Glus in any order followed by His, Ser, any residue other then Pro, then two of any residue followed by Pro followed by two to four of any residue followed by Cys.

The search is case-independent, so 'AAA' matches 'aaa'.

Output file format

Here is the output from the example search:


     ACT1_FUGRU    53 YVGDEAQSKRG
     ACT2_FUGRU    53 YVGDEAQSKRG
     ACT3_FUGRU    53 YVGDEAQSKRG
     ACTC_FUGRU    55 YVGDEAQSKRG
     ACTS_FUGRU    55 YVGDEAQSKRG
     ACTT_FUGRU    55 YVGDEAQSKRG

It is composed of three columns of data.

If the option '-mmshow' is used, then an extra column of data is output indicating how many mismatches there are:


% fuzzpro -mmshow
Protein pattern search
Input sequence(s): sw:100k_rat
Search pattern: RARLX(3)R
Number of mismatches [0]: 1
Output file [100k_rat.fuzzpro]: stdout

       100K_RAT   613     1 EARLNCFRN

Data files

Notes

References

Warnings

Diagnostic Error Messages

Exit status

Known bugs

See also

Program nameDescription
dregregular expression search of a nucleotide sequence
fuzznucNucleic acid pattern search
fuzztranProtein pattern search after translation
patmatdbSearch a protein sequence with a motif
patmatmotifsSearch a PROSITE motif database with a protein sequence
pregregular expression search of a protein sequence
pscanScans proteins using PRINTS
tfscanScans DNA sequences for transcription factors

Other EMBOSS programs allow you to search for regular expression patterns but may be less easy for the user who has never used regular expressions before:

Author(s)

This application was written by Alan Bleasby (ableasby@hgmp.mrc.ac.uk)

History

Target users

This program is intended to be used by everyone and everything, from naive users to embedded scripts.

Comments