EMBOSS: etandem


Program etandem

Function

Looks for tandem repeats in a nucleotide sequence

Description

etandem looks for tandem repeats in a sequence. It is normally used after equicktandem has been run to identify potential repeat sizes.

Usage

Here is a sample session with etandem. The input sequence is the human herpesvirus tandem repeat.
% etandem
Input sequence: embl:hhtetra
Output file [hhtetra.tan]: 
Minimum repeat size [10]: 6
Maximum repeat size [6]: 

Command line arguments

   Mandatory qualifiers:
  [-sequence]          sequence   Sequence USA
  [-outfile]           outfile    Output file name
   -minrepeat          integer    Minimum repeat size
   -maxrepeat          integer    Maximum repeat size

   Optional qualifiers: (none)
   Advanced qualifiers:
   -threshold          integer    Threshold score
   -mismatch           bool       Allow N as a mismatch
   -uniform            bool       Allow uniform consensus


Mandatory qualifiers Allowed values Default
[-sequence]
(Parameter 1)
Sequence USA Readable sequence Required
[-outfile]
(Parameter 2)
Output file name Output file <sequence>.etandem
-minrepeat Minimum repeat size Integer, 2 or higher 10
-maxrepeat Maximum repeat size Integer, same as -minrepeat or higher Same as -minrepeat
Optional qualifiers Allowed values Default
(none)
Advanced qualifiers Allowed values Default
-threshold Threshold score Any integer value 20
-mismatch Allow N as a mismatch Yes/No No
-uniform Allow uniform consensus Yes/No No

Input file format

The input for etandem is a nucleotide sequence.

Output file format

The output from etandem is an uncommented list of identified repeats. In a future version this will change to be annotated sequence features.

The columns of the report show:

  1. Score
  2. Start base position
  3. End base position
  4. Repeat size
  5. Repeat count
  6. Percent identity
  7. Consensus sequence

   120        793        936  6  24  93.8 acccta
    90        283        420  6  23  84.8 taaccc
    38        432        485  6   9  90.7 ccctaa
    26        494        529  6   6  94.4 ccctaa
    24        568        597  6   5 100.0 aaccct

Data files

Notes

References

None.

Warnings

None.

Diagnostics

None.

Exit status

None.

Known bugs

None.

See also

Program nameDescription
chaosCreate a chaos game representation plot for a sequence
cpgplotPlot CpG rich areas
cpgreportReports all CpG rich regions
diffseqFind differences (SNPs) between nearly identical sequences
dotmatcherDisplays a thresholded dotplot of two sequences
dotpathDisplays a non-overlapping wordmatch dotplot of two sequences
dottupDisplays a wordmatch dotplot of two sequences
einvertedFinds DNA inverted repeats
equicktandemFinds tandem repeats
isochorePlots isochores in large DNA sequences
newcpgreportReport CpG rich areas
newcpgseekReports CpG rich regions
palindromeLooks for inverted repeats in a nucleotide sequence
polydotDisplays all-against-all dotplots of a set of sequences
redataSearch REBASE for enzyme name, references, suppliers etc
restrictFinds restriction enzyme cleavage sites
showseqDisplay a sequence with features, translation etc
silentSilent mutation restriction enzyme scan
tfscanScans DNA sequences for transcription factors

Authors

This program was originally written by Richard Durbin and Jean Thierry-Mieg as part of the ACEDB genome database package.

This application was modified for inclusion in EMBOSS by Peter Rice (pmr@sanger.ac.uk) Informatics Division, The Sanger Centre, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK.

Priority

Completed 25 May 1999

Target

etandem is aimed at automated repeat identification in genomic sequnece but can also be used by general users.

Comments