![]() |
EMBOSS: wordmatch |
This program takes two sequences and finds regions where they are identical. These regions are reported in the output file (and optionally) in GFF (Gene Feature Format) files.
It will not find identical regions smaller than the specified wordsize.
% wordmatch sw:hba_human sw:hbb_human Output file [hba_human.wordmatch]: Word size [4]:
Mandatory qualifiers: [-asequence] sequence Sequence USA [-bsequence] sequence Sequence USA -wordsize integer Word size [-outfile] align (no help text) align value Optional qualifiers: (none) Advanced qualifiers: -afeatout featout File for output of normal tab delimited GFF features -bfeatout featout File for output of normal tab delimited GFF features General qualifiers: -help bool report command line options. More information on associated and general qualifiers can be found with -help -verbose |
Mandatory qualifiers | Allowed values | Default | |
---|---|---|---|
[-asequence] (Parameter 1) |
Sequence USA | Readable sequence | Required |
[-bsequence] (Parameter 2) |
Sequence USA | Readable sequence | Required |
-wordsize | Word size | Integer 2 or more | 4 |
[-outfile] (Parameter 3) |
(no help text) align value | Alignment file | |
Optional qualifiers | Allowed values | Default | |
(none) | |||
Advanced qualifiers | Allowed values | Default | |
-afeatout | File for output of normal tab delimited GFF features | Writeable feature table | unknown.gff |
-bfeatout | File for output of normal tab delimited GFF features | Writeable feature table | unknown.gff |
FINALLY length = 3 HBA_HUMAN HBB_HUMAN Length 58 63 5 14 15 4 116 121 4
The first line ('FINALLY...') gives the number of regions found.
The next line gives the headers for the subsequent columns of data. This consists for the names of the two sequence and the word 'Length'.
Subsequent lines consist of three columns fo numbers separated by spaces or TAB characters. Each line contains the information on one identical region. The first column is the start position in the first sequence of the identical region. The second number is the start position in the second sequence. the third number is the length of the identical region.
If no regions are found, the output file is blank.
Program name | Description |
---|---|
matcher | Finds the best local alignments between two sequences |
seqmatchall | Does an all-against-all comparison of a set of sequences |
supermatcher | Finds a match of a large sequence against one or more sequences |
water | Smith-Waterman local alignment |