![]() |
EMBOSS: fuzzpro |
Patterns are specifications of a (typically short) length of sequence to be found. They can specify a search for an exact sequence or they can allow various ambiguities, matches to variable lengths of sequence and repeated subsections of the sequence.
fuzzpro intelligently selects the optimum searching algorithm to use, depending on the complexity of the search pattern specified.
% fuzzpro Input sequence: sw:* Search pattern: [FY]-[LIV]-G-[DE]-E-A-Q-x-[RKQ](2)-G Number of mismatches [0]: Output file [5h1d_fugru.fuzzpro]:
Mandatory qualifiers: [-sequence] seqall Sequence database USA -pattern string Search pattern -mismatch integer Number of mismatches [-outf] outfile Output file name Optional qualifiers: (none) Advanced qualifiers: -mmshow bool Show mismatches -accshow bool Show accession numbers -usashow bool Showing the USA (Uniform Sequence Address) of the matching sequences will turn your output file into a 'list' file that can then be read in by many other EMBOSS programs by specifying it with a '@' in front of the filename. -descshow bool Show descriptions General qualifiers: -help bool report command line options. More information on associated and general qualifiers can be found with -help -verbose |
Mandatory qualifiers | Allowed values | Default | |
---|---|---|---|
[-sequence] (Parameter 1) |
Sequence database USA | Readable sequence(s) | Required |
-pattern | Search pattern | Any string is accepted | An empty string is accepted |
-mismatch | Number of mismatches | Integer 0 or more | 0 |
[-outf] (Parameter 2) |
Output file name | Output file | <sequence>.fuzzpro |
Optional qualifiers | Allowed values | Default | |
(none) | |||
Advanced qualifiers | Allowed values | Default | |
-mmshow | Show mismatches | Yes/No | No |
-accshow | Show accession numbers | Yes/No | No |
-usashow | Showing the USA (Uniform Sequence Address) of the matching sequences will turn your output file into a 'list' file that can then be read in by many other EMBOSS programs by specifying it with a '@' in front of the filename. | Yes/No | No |
-descshow | Show descriptions | Yes/No | No |
Patterns for fuzzpro are based on the format of pattern used in the PROSITE database, with the difference that the terminating dot '.' and the hyphens, '-', between the characters are optional.
The PROSITE pattern definition from the PROSITE documentation follows.
For example, in SWISSPROT entry 100K_RAT you can look for the pattern:
[DE](2)HS{P}X(2)PX(2,4)C
This means: Two Asps or Glus in any order followed by His, Ser, any residue other then Pro, then two of any residue followed by Pro followed by two to four of any residue followed by Cys.
The search is case-independent, so 'AAA' matches 'aaa'.
ACT1_FUGRU 53 YVGDEAQSKRG ACT2_FUGRU 53 YVGDEAQSKRG ACT3_FUGRU 53 YVGDEAQSKRG ACTC_FUGRU 55 YVGDEAQSKRG ACTS_FUGRU 55 YVGDEAQSKRG ACTT_FUGRU 55 YVGDEAQSKRG
It is composed of three columns of data.
If the option '-mmshow' is used, then an extra column of data is output indicating how many mismatches there are:
% fuzzpro -mmshow Protein pattern search Input sequence(s): sw:100k_rat Search pattern: RARLX(3)R Number of mismatches [0]: 1 Output file [100k_rat.fuzzpro]: stdout 100K_RAT 613 1 EARLNCFRN
If the option '-desc' is used then the description of the sequence is displayed before each line showing the match details. For example:
% fuzzpro 'sw:*_HUMAN' -desc Protein pattern search Search pattern: [FY]-[LIV]-G-[DE]-E-A-Q-x-[RKQ](2)-G Number of mismatches [0]: Output file [143b_human.fuzzpro]: stdout ACTIN, AORTIC SMOOTH MUSCLE (ALPHA-ACTIN 2). ACTA_HUMAN 55 YVGDEAQSKRG ACTIN, CYTOPLASMIC 1 (BETA-ACTIN). ACTB_HUMAN 53 YVGDEAQSKRG ACTIN, ALPHA CARDIAC. ACTC_HUMAN 55 YVGDEAQSKRG ACTIN, CYTOPLASMIC 2 (GAMMA-ACTIN). ACTG_HUMAN 53 YVGDEAQSKRG ACTIN, GAMMA-ENTERIC SMOOTH MUSCLE (ALPHA-ACTIN 3). ACTH_HUMAN 54 YVGDEAQSKRG ACTIN, ALPHA SKELETAL MUSCLE (ALPHA-ACTIN 1). ACTS_HUMAN 55 YVGDEAQSKRG
If the option '-acc' is also used then the accession number of the sequence is displayed before each line showing the match details. For example:
% fuzzpro 'sw:*_HUMAN' -desc -acc Protein pattern search Search pattern: [FY]-[LIV]-G-[DE]-E-A-Q-x-[RKQ](2)-G Number of mismatches [0]: Output file [143b_human.fuzzpro]: stdout P03996 ACTIN, AORTIC SMOOTH MUSCLE (ALPHA-ACTIN 2). ACTA_HUMAN 55 YVGDEAQSKRG P02570 ACTIN, CYTOPLASMIC 1 (BETA-ACTIN). ACTB_HUMAN 53 YVGDEAQSKRG P04270 ACTIN, ALPHA CARDIAC. ACTC_HUMAN 55 YVGDEAQSKRG P02571 ACTIN, CYTOPLASMIC 2 (GAMMA-ACTIN). ACTG_HUMAN 53 YVGDEAQSKRG P12718 ACTIN, GAMMA-ENTERIC SMOOTH MUSCLE (ALPHA-ACTIN 3). ACTH_HUMAN 54 YVGDEAQSKRG P02568 ACTIN, ALPHA SKELETAL MUSCLE (ALPHA-ACTIN 1). ACTS_HUMAN 55 YVGDEAQSKRG
If the option '-usa' is used then the Uniform Sequence Address is output at the start of each line of match details. For example:
% fuzzpro 'sw:*_HUMAN' -usa Protein pattern search Search pattern: [FY]-[LIV]-G-[DE]-E-A-Q-x-[RKQ](2)-G Number of mismatches [0]: Output file [143b_human.fuzzpro]: stdout sw-id:ACTA_HUMAN ACTA_HUMAN 55 YVGDEAQSKRG sw-id:ACTB_HUMAN ACTB_HUMAN 53 YVGDEAQSKRG sw-id:ACTC_HUMAN ACTC_HUMAN 55 YVGDEAQSKRG sw-id:ACTG_HUMAN ACTG_HUMAN 53 YVGDEAQSKRG sw-id:ACTH_HUMAN ACTH_HUMAN 54 YVGDEAQSKRG sw-id:ACTS_HUMAN ACTS_HUMAN 55 YVGDEAQSKRG
This is useful because it turns the output into a 'list' file of sequence names that can then be read in by other EMBOSS programs when a '@' is put at the start of the file.
Program name | Description |
---|---|
antigenic | Finds antigenic sites in proteins |
digest | Protein proteolytic enzyme or reagent cleavage digest |
fuzztran | Protein pattern search after translation |
helixturnhelix | Report nucleic acid binding motifs |
oddcomp | Finds protein sequence regions with a biased composition |
patmatdb | Search a protein sequence with a motif |
patmatmotifs | Search a PROSITE motif database with a protein sequence |
pepcoil | Predicts coiled coil regions |
preg | Regular expression search of a protein sequence |
pscan | Scans proteins using PRINTS |
sigcleave | Reports protein signal cleavage sites |
Other EMBOSS programs allow you to search for regular expression patterns but may be less easy for the user who has never used regular expressions before: