![]() |
EMBOSS: fuzznuc |
Patterns are specifications of a (typically short) length of sequence to be found. They can specify a search for an exact sequence or they can allow various ambiguities, matches to variable lengths of sequence and repeated subsections of the sequence.
fuzznuc intelligently selects the optimum searching algorithm to use, depending on the complexity of the search pattern specified.
% fuzznuc Input sequence: embl:hhtetra Search pattern: AAGCTT Number of mismatches [0]: Output file [hhtetra.fuzznuc]:
Mandatory qualifiers: [-sequence] seqall Sequence database USA -pattern string Search pattern -mismatch integer Number of mismatches [-outf] outfile Output file name Optional qualifiers: (none) Advanced qualifiers: -mmshow bool Show mismatches -accshow bool Show accession numbers -descshow bool Show descriptions -complement bool Search complementary strand |
Mandatory qualifiers | Allowed values | Default | |
---|---|---|---|
[-sequence] (Parameter 1) |
Sequence database USA | Readable sequence(s) | Required |
-pattern | Search pattern | Any string is accepted | An empty string is accepted |
-mismatch | Number of mismatches | Integer 0 or more | 0 |
[-outf] (Parameter 2) |
Output file name | Output file | <sequence>.fuzznuc |
Optional qualifiers | Allowed values | Default | |
(none) | |||
Advanced qualifiers | Allowed values | Default | |
-mmshow | Show mismatches | Yes/No | No |
-accshow | Show accession numbers | Yes/No | No |
-descshow | Show descriptions | Yes/No | No |
-complement | Search complementary strand | Yes/No | No |
The PROSITE pattern definition from the PROSITE documentation (amended to refer to nucleic acid sequences, not proteins) follows.
For example, in the EMBL entry ECLAC you can look for the pattern:
[CG](5)TG{A}N(1,5)C
This searches for "C or G" 5 times, followed by T and G, then anything except A, then any base (1 to 5 times) before a C.
You can use ambiguity codes for nucleic acid searches but not within [] or {} as they expand to bracketed counterparts. For example, "s" is expanded to "[GC]" therefore [S] would be expanded to [[GC]] which is illegal.
Note the use of X is reserved for proteins. You must use N for nucleic acids to refer to any base.
The search is case-independent, so 'AAA' matches 'aaa'.
HHTETRA 1 AAGCTT HHTETRA 1267 AAGCTT
It is composed of three columns of data.
% fuzznuc embl:hhtetra -mmshow Nucleic acid pattern search Search pattern: AAGCTT Number of mismatches [0]: 1 Output file [hhtetra.fuzznuc]: stdout
HHTETRA 53 1 AAGCTG HHTETRA 140 1 AAGCAT HHTETRA 314 1 AACCTT HHTETRA 350 1 AAGCCT HHTETRA 374 1 AAGTTT HHTETRA 1009 1 AAGTTT HHTETRA 1259 1 AAGTTT HHTETRA 1267 0 AAGCTT
Program name | Description |
---|---|
dreg | regular expression search of a nucleotide sequence |
fuzzpro | Protein pattern search |
fuzztran | Protein pattern search after translation |
patmatdb | Search a protein sequence with a motif |
patmatmotifs | Search a PROSITE motif database with a protein sequence |
preg | regular expression search of a protein sequence |
pscan | Scans proteins using PRINTS |
tfscan | Scans DNA sequences for transcription factors |
Other EMBOSS programs allow you to search for regular expression patterns but may be less easy for the user who has never used regular expressions before: