![]() |
EMBOSS: fuzznuc |
Patterns are specifications of a (typically short) length of sequence to be found. They can specify a search for an exact sequence or they can allow various ambiguities, matches to variable lengths of sequence and repeated subsections of the sequence.
fuzznuc intelligently selects the optimum searching algorithm to use, depending on the complexity of the search pattern specified.
% fuzznuc Input sequence: embl:hhtetra Search pattern: AAGCTT Number of mismatches [0]: Output file [hhtetra.fuzznuc]:
Mandatory qualifiers: [-sequence] seqall Sequence database USA -pattern string Search pattern -mismatch integer Number of mismatches [-outf] outfile Output file name Optional qualifiers: (none) Advanced qualifiers: -mmshow bool Show mismatches -accshow bool Show accession numbers -descshow bool Show descriptions -usashow bool Showing the USA (Uniform Sequence Address) of the matching sequences will turn your output file into a 'list' file that can then be read in by many other EMBOSS programs by specifying it with a '@' in front of the filename. -complement bool Search complementary strand General qualifiers: -help bool report command line options. More information on associated and general qualifiers can be found with -help -verbose |
Mandatory qualifiers | Allowed values | Default | |
---|---|---|---|
[-sequence] (Parameter 1) |
Sequence database USA | Readable sequence(s) | Required |
-pattern | Search pattern | Any string is accepted | An empty string is accepted |
-mismatch | Number of mismatches | Integer 0 or more | 0 |
[-outf] (Parameter 2) |
Output file name | Output file | <sequence>.fuzznuc |
Optional qualifiers | Allowed values | Default | |
(none) | |||
Advanced qualifiers | Allowed values | Default | |
-mmshow | Show mismatches | Yes/No | No |
-accshow | Show accession numbers | Yes/No | No |
-descshow | Show descriptions | Yes/No | No |
-usashow | Showing the USA (Uniform Sequence Address) of the matching sequences will turn your output file into a 'list' file that can then be read in by many other EMBOSS programs by specifying it with a '@' in front of the filename. | Yes/No | No |
-complement | Search complementary strand | Yes/No | No |
The PROSITE pattern definition from the PROSITE documentation (amended to refer to nucleic acid sequences, not proteins) follows.
For example, in the EMBL entry ECLAC you can look for the pattern:
[CG](5)TG{A}N(1,5)C
This searches for "C or G" 5 times, followed by T and G, then anything except A, then any base (1 to 5 times) before a C.
You can use ambiguity codes for nucleic acid searches but not within [] or {} as they expand to bracketed counterparts. For example, "s" is expanded to "[GC]" therefore [S] would be expanded to [[GC]] which is illegal.
Note the use of X is reserved for proteins. You must use N for nucleic acids to refer to any base.
The search is case-independent, so 'AAA' matches 'aaa'.
HHTETRA 1 AAGCTT HHTETRA 1267 AAGCTT
It is composed of three columns of data.
% fuzznuc embl:hhtetra -mmshow Nucleic acid pattern search Search pattern: AAGCTT Number of mismatches [0]: 1 Output file [hhtetra.fuzznuc]: stdout
HHTETRA 53 1 AAGCTG HHTETRA 140 1 AAGCAT HHTETRA 314 1 AACCTT HHTETRA 350 1 AAGCCT HHTETRA 374 1 AAGTTT HHTETRA 1009 1 AAGTTT HHTETRA 1259 1 AAGTTT HHTETRA 1267 0 AAGCTT
If the option '-desc' is used then the description of the sequence is displayed before each line showing the match details. For example:
% fuzznuc embl:hhtetra -desc Nucleic acid pattern search Search pattern: AAGCTT Number of mismatches [0]: Output file [hhtetra.fuzznuc]: stdout Human herpesvirus 7 (clone ED132'1.2) telomeric repeat region. HHTETRA 1 AAGCTT Human herpesvirus 7 (clone ED132'1.2) telomeric repeat region. HHTETRA 1267 AAGCTT
If the option '-acc' is also used then the accession number of the sequence is displayed before each line showing the match details. For example:
% fuzznuc embl:hhtetra -desc -acc Nucleic acid pattern search Search pattern: AAGCTT Number of mismatches [0]: Output file [hhtetra.fuzznuc]: stdout L46634 Human herpesvirus 7 (clone ED132'1.2) telomeric repeat region. HHTETRA 1 AAGCTT L46634 Human herpesvirus 7 (clone ED132'1.2) telomeric repeat region. HHTETRA 1267 AAGCTT
If the option '-usa' is used then the Uniform Sequence Address is output at the start of each line of match details. For example:
% fuzznuc embl:hhtetra -usa Nucleic acid pattern search Search pattern: AAGCTT Number of mismatches [0]: Output file [hhtetra.fuzznuc]: stdout embl-id:HHTETRA HHTETRA 1 AAGCTT embl-id:HHTETRA HHTETRA 1267 AAGCTT
This is useful because it turns the output into a 'list' file of sequence names that can then be read in by other EMBOSS programs when a '@' is put at the start of the file.
If the option '-comp' is used, then the search will also be done on the reverse sense strand. Any matches in that strand will be displayed with the start position using the forward-sense positions (it actually gives the position of the end of the match). The matching sequence will be given in square brackets to distinguish this from a forward-sense match. For example:
% fuzznuc embl:hhtetra -comp Nucleic acid pattern search Search pattern: AAGCTT Number of mismatches [0]: Output file [hhtetra.fuzznuc]: stdout HHTETRA 1 AAGCTT HHTETRA 1267 AAGCTT HHTETRA 1 [AAGCTT] HHTETRA 1267 [AAGCTT]
Program name | Description |
---|---|
dreg | regular expression search of a nucleotide sequence |
fuzztran | Protein pattern search after translation |
marscan | Finds MAR/SAR sites in nucleic sequences |
Other EMBOSS programs allow you to search for regular expression patterns but may be less easy for the user who has never used regular expressions before: