EMBOSS: dotmatcher


Program dotmatcher

Function

Displays a thresholded dotplot of two sequences

Description

A dotplot is a graphical representation of the regions of similarity between two sequences.

The two sequences are placed on the axes of a rectangular image and (subject to threshold conditions) wherever there is a similarity between the sequences a dot is placed on the image.

Where the two sequences have substantial regions of similarity, many dots align to form diagonal lines. It is therefore possible to see at a glance where there are local regions of similarity as these will have long diagonal lines. It is also easy to see other features such as repeats (which form parallel diagonal lines), and insertions or deletions (which form breaks or discontinuities in the diagonal lines).

dotmatcher uses a threshold to define whether a match is plotted (calculated from the substitution matrix). A window of specified length is moved up all possible diagonals and a score is calculated within each window for each position along the diagonals. The score is the sum of the comparisons of the two sequences using the given similarity matrix along the window. If the score is above the threshold, then a line is plotted on the image over the position of the window.

Usage

Here is a sample session with dotmatcher.

% dotmatcher sw:hba_human sw:hbb_human

click here for result

Command line arguments

   Mandatory qualifiers (* if not always prompted):
  [-sequencea]         sequence   Sequence USA
  [-sequenceb]         sequence   Sequence USA
*  -graph              graph      Graph type
*  -outfile            outfile    Display as data

   Optional qualifiers:
   -windowsize         integer    window size over which to test threshhold
   -threshold          integer    threshold
   -matrixfile         matrix     Matrix file

   Advanced qualifiers:
   -data               bool       Output the match data to a file instead of
                                  plotting it

   General qualifiers:
  -help                bool       report command line options. More
                                  information on associated and general
                                  qualifiers can be found with -help -verbose


Mandatory qualifiers Allowed values Default
[-sequencea]
(Parameter 1)
Sequence USA Readable sequence Required
[-sequenceb]
(Parameter 2)
Sequence USA Readable sequence Required
-graph Graph type EMBOSS has a list of known devices, including postscript, ps, hpgl, hp7470, hp7580, meta, colourps, cps, xwindows, x11, tektronics, tekt, tek4107t, tek, none, null, text, data, xterm, png EMBOSS_GRAPHICS value, or x11
-outfile Display as data Output file <sequence>.dotmatcher
Optional qualifiers Allowed values Default
-windowsize window size over which to test threshhold Integer 3 or more 10
-threshold threshold Integer 0 or more 17
-matrixfile Matrix file Comparison matrix file in EMBOSS data path EBLOSUM62 for protein
EDNAFULL for DNA
Advanced qualifiers Allowed values Default
-data Output the match data to a file instead of plotting it Yes/No No

Input file format

Any 2 sequence USAs of the same type (DNA or protein).

Output file format

An image is output to the requested graphics device.

Data files

It uses the specified matrix substitution file to compare the two sequences.

For protein sequences EBLOSUM62 is used for the substitution matrix. For nucleotide sequence, EDNAFULL is used. Others can be specified.

EMBOSS data files are distributed with the application and stored in the standard EMBOSS data directory, which is defined by EMBOSS environment variable EMBOSS_DATA.

Users can provide their own data files in their own directories. Project specific files can be put in the current directory, or for tidier directory listings in a subdirectory called ".embossdata". Files for all EMBOSS runs can be put in the user's home directory, or again in a subdirectory called ".embossdata".

The directories are searched in the following order:

Notes

None.

References

None.

Warnings

None.

Diagnostic Error Messages

None.

Exit status

0 upon successful completion.

Known bugs

None.

See also

Program nameDescription
dotpathDisplays a non-overlapping wordmatch dotplot of two sequences
dottupDisplays a wordmatch dotplot of two sequences
polydotDisplays all-against-all dotplots of a set of sequences

dottup, by comparison, has no threshold, using a wordmatch-style method. dottup is less sensitive, but substantially faster than dotmatcher.

Author(s)

This application was written by Ian Longden (il@sanger.ac.uk) Informatics Division, The Sanger Centre, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK.

History

 Completed 1st June 1999. 
 Last modified 16th June 1999.

Target users

This program is intended to be used by everyone and everything, from naive users to embedded scripts.

Comments