EMBOSS: dotmatcher


Program dotmatcher

Function

Displays a thresholded dotplot of two sequences

Description

A dotplot is a graphical representation of the regions of similarity between two sequences.

The two sequences are placed on the axes of a rectangular image and (subject to threshold conditions) wherever there is a similarity between the sequences a dot is placed on the image.

Where the two sequences have substantial regions of similarity, many dots align to form diagonal lines. It is therefore possible to see at a glance where there are local regions of similarity.

dotmatcher uses a threshold to define whether a match is plotted (calculated from the substitution matrix). A window of specified length is moved up all possible diagonals and a score is calculated within each window for each position along the diagonals. The score is the sum of the comparisons of the two sequences using the given similarity matrix along the window. If the score is above the threshold, then a line is plotted on the image over the position of the window.

Usage

Here is a sample session with dotmatcher.

% dotmatcher sw:hba_human sw:hbb_human

click here for result

Command line arguments

   Mandatory qualifiers (* if not always prompted):
  [-sequencea]         sequence   Sequence USA
  [-sequenceb]         sequence   Sequence USA
*  -graph              graph      Graph type
*  -outfile            outfile    Display as data

   Optional qualifiers:
   -windowsize         integer    window size over which to test threshhold
   -threshold          float      threshold
   -matrixfile         matrix     Matrix file

   Advanced qualifiers:
   -data               bool       Output the match data to a file instead of
                                  plotting it


Mandatory qualifiers Allowed values Default
[-sequencea]
(Parameter 1)
Sequence USA Readable sequence Required
[-sequenceb]
(Parameter 2)
Sequence USA Readable sequence Required
-graph Graph type EMBOSS has a list of known devices, including postscript, ps, hpgl, hp7470, hp7580, meta, colourps, cps, xwindows, x11, tektronics, tekt, tek4107t, tek, none, null, text, data, xterm EMBOSS_GRAPHICS value, or x11
-outfile Display as data Output file <sequence>.dotmatcher
Optional qualifiers Allowed values Default
-windowsize window size over which to test threshhold Integer 3 or more 10
-threshold threshold Number 0.000 or more 17.0
-matrixfile Matrix file Comparison matrix file in EMBOSS data path EBLOSUM62 for protein
EDNAMAT for DNA
Advanced qualifiers Allowed values Default
-data Output the match data to a file instead of plotting it Yes/No No

Input file format

Any 2 sequence USAs of the same type (DNA or protein).

Output file format

A .ps file is produced if postscript output is requested.

Data files

Matrix substitution file.

Notes

References

Warnings

Diagnostic Error Messages

Exit status

0 upon successful completion.

Known bugs

See also

Program nameDescription
antigenicFinds antigenic sites in proteins
chaosCreate a chaos game representation plot for a sequence
cpgplotPlot CpG rich areas
cpgreportReports all CpG rich regions
diffseqFind differences (SNPs) between nearly identical sequences
dotpathDisplays a non-overlapping wordmatch dotplot of two sequences
dottupDisplays a wordmatch dotplot of two sequences
einvertedFinds DNA inverted repeats
equicktandemFinds tandem repeats
etandemLooks for tandem repeats in a nucleotide sequence
garnierPredicts protein secondary structure
helixturnhelixReport nucleic acid binding motifs
isochorePlots isochores in large DNA sequences
newcpgreportReport CpG rich areas
newcpgseekReports CpG rich regions
oddcompFinds protein sequence regions with a biased composition
palindromeLooks for inverted repeats in a nucleotide sequence
pepcoilPredicts coiled coil regions
polydotDisplays all-against-all dotplots of a set of sequences
pscanScans proteins using PRINTS
redataSearch REBASE for enzyme name, references, suppliers etc
restrictFinds restriction enzyme cleavage sites
seqmatchallDoes an all-against-all comparison of a set of sequences
showseqDisplay a sequence with features, translation etc
sigcleaveReports protein signal cleavage sites
silentSilent mutation restriction enzyme scan
stssearchSearches a DNA database for matches with a set of STS primers
supermatcherFinds a match of a large sequence against one or more sequences
tfscanScans DNA sequences for transcription factors
tmapDisplays membrane spanning regions
wordmatchFinds all exact matches of a given size between 2 sequences
dottup, by comparison, has no threshold.

This is really just the wordmatch method with a grahical output.

Author(s)

This application was written by Ian Longden (il@sanger.ac.uk) Informatics Division, The Sanger Centre, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK.

History

 Completed 1st June 1999. 
 Last modified 16th June 1999.

Target users

This program is intended to be used by everyone and everything, from naive users to embedded scripts.

Comments