EMBOSS: helixturnhelix


Program helixturnhelix

Function

Report nucleic acid binding motifs

Description

Helixturnhelix uses the method of Dodd and Egan and finds helix-turn-helix nucleic acid binding motifs in proteins.

Usage

Here is a sample session with helixturnhelix.

% helixturnhelix
Input sequence: sw:laci_ecoli
Output file [laci_ecoli.hth]: 

Command line arguments

   Mandatory qualifiers:
  [-sequence]          seqall     Sequence database USA
  [-outfile]           outfile    Output file name

   Optional qualifiers:
   -mean               float      Mean value
   -sd                 float      Standard Deviation value
   -minsd              float      Minimum SD
   -eightyseven        bool       Use the old (1987) weight data

   Advanced qualifiers: (none)

Mandatory qualifiers Allowed values Default
[-sequence]
(Parameter 1)
Sequence database USA Readable sequence(s) Required
[-outfile]
(Parameter 2)
Output file name Output file seqname.hth
Optional qualifiers Allowed values Default
-mean Mean value Number from 1.000 to 10000.000 238.71
-sd Standard Deviation value Number from 1.000 to 10000.000 293.61
-minsd Minimum SD Number from 0.000 to 100.000 2.5
-eightyseven Use the old (1987) weight data Yes/No No
Advanced qualifiers Allowed values Default
(none)

Input file format

The input sequence can be one or more protein sequences.

Output file format

The output from helixturnhelix is a simple text one. It reports the highest scoring hit followed by all hits above the minimum standard deviation. Here is a sample output:

HELIXTURNHELIX: Nucleic Acid Binding Domain search


Hits above +2.50 SD (972.73)

Score 2178 (+6.60 SD) in GALR_ECOLI at residue 2

 Sequence:  ATIKDVARLAGVSVATVSRVIN
            |                    |
            2                    23

Data files

The data files are stored in the standard EMBOSS data directory. The names are: With care these can be replaced to suit your data sets. If the files are placed in the following directories they will be used in preference to the files in the EMBOSS distribution data directory: Here is the default file:

# Amino acid counts for 91 Helix-turn-helix (presumed) protein motifs
# from Dodd IB and Egan JB (1990) Nucl. Acids. Res. 18:5019-5026.
#
Sample: 91 aligned sequences
#
# R  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 Total Exp
# - -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- ----- ---
  A  2  1  3 14 10 12 75  6 15  9  1  1  4  3  8 15  4  4  4 11  0 10   212 995
  C  0  0  1  1  0  0  0  0  0  3  3  1  1  0  0  0  0  0  0  1  0  3    14 106
  D  0  1  0  1 14  0  0 14  1  0  5  0  1  2  0  0  0  0  1  1  0  2    43 556
  E  4  5  0 11 26  0  0 16  9  3  3  0  3 12 13  0  0  2  0  1 13  6   127 669
  F  4  0  4  0  0  4  0  1  0 10  0  0  0  0  1  0  0  1  1  1 22  0    49 358
  G  9  7  1  4  0  0  8  0  0  0 50  0  6  0  7  1  0  3  1  1  0  4   102 761
  H  4  3  1  1  2  0  0  3  2  0  5  0  3  3  0  2  0  2  4  5  0  2    42 225
  I 10  0 13  3  2 15  0  4  9  4  0 17  0  2  0  1 31  1  4  8 16  1   141 583
  K  4  4  6 11 12  1  1 14 11  0  5  2  2  7  2  1  0  5  8  4  5 15   120 516
  L 16  1 17  0  1 35  0  3 12 31  0 22  0  2  1  1 22  1  1 12 20  0   198 954
  M  7  0  2  1  1  1  0  0  5  7  1 10  0  0  2  0  2  0  0  2  0  1    42 275
  N  0  8  0  1  0  0  0  2  1  1 14  0  8  1  4  2  0  4  9  0  0 11    66 383
  P  1  6  0  1  0  0  0  0  0  0  0  0  3 13  7  0  0  0  0  0  0  3    34 403
  Q  2  1 21  9 11  0  0  9  8  0  0  2  1 17  7 12  0  3 12  5  3  9   132 437
  R  9 10 14  9  5  0  1 16 10  0  1  0  1 17  8  7  0 17 28  3  0 16   172 609
  S  2 17  0  8  4  1  6  1  2  2  3  0 37  1 25  5  0 29  3  0  1  5   152 552
  T  6 24  3 12  1  5  0  2  2  4  0  5 20  4  3 39  0  4  1  0  4  3   142 512
  V  7  3  1  1  2 16  0  0  2 12  0 29  0  5  3  3 32  0  7  8  7  0   138 724
  W  2  0  0  0  0  0  0  0  0  1  0  1  0  0  0  0  0  0  2 21  0  0    27 105
  Y  2  0  4  3  0  1  0  0  2  4  0  1  1  2  0  2  0 15  5  7  0  0    49 267

Notes

References

  1. Dodd I.B., Egan J.B. (1987) "Systematic method for the detection of potential lambda cro-like DNA-binding regions in proteins." J. Mol. Biol. 194: 557-564.
  2. Dodd I.B., Egan J.B. (1990) "Improved detection of helix-turn-helix DNA-binding motifs in protein sequences." Nucleic Acids Res. 18: 5019-5026.

Warnings

The program will warn you if a nucleic acid sequence is given or if the data file is not mathematically accurate.

Diagnostic Error Messages

Exit status

It exits with status 0 unless an error is reported.

Known bugs

None.

See also

Program nameDescription
antigenicFinds antigenic sites in proteins
diffseqFind differences (SNPs) between nearly identical sequences
dotmatcherDisplays a thresholded dotplot of two sequences
dotpathDisplays a non-overlapping wordmatch dotplot of two sequences
dottupDisplays a wordmatch dotplot of two sequences
garnierPredicts protein secondary structure
oddcompFinds protein sequence regions with a biased composition
patmatdbSearch a protein sequence with a motif
patmatmotifsSearch a PROSITE motif database with a protein sequence
pepcoilPredicts coiled coil regions
pepnetDisplays proteins as a helical net
pepwheelShows protein sequences as helices
polydotDisplays all-against-all dotplots of a set of sequences
printsextractExtract data from PRINTS
prosextractBuilds the PROSITE motif database for patmatmotifs to search
pscanScans proteins using PRINTS
showseqDisplay a sequence with features, translation etc
sigcleaveReports protein signal cleavage sites
tfscanScans DNA sequences for transcription factors
tmapDisplays membrane spanning regions

Author(s)

This application was written by Alan Bleasby (ableasby@hgmp.mrc.ac.uk)

Original program "HELIXTURNHELIX" by Peter Rice (EGCG 1990)

History

Completed 11th March 1999

Target users

This program is intended to be used by everyone and everything, from naive users to embedded scripts.

Comments