EMBOSS: patmatdb


Program patmatdb

Function

Search a protein sequence with a motif

Description

Takes a protein motif and compares it to a set of protein sequences.

It returns the number of matches there were between the motif and each matched sequence, length of match, start and end positions of match, and writes out an alignment.

Usage

Here is a sample session with patmatdb.

% patmatdb
Search a protein sequence with a motif
Input sequence(s): sw:*
Protein motif to search for: st[ty]s
Output file [5h1d_fugru.patmatdb]:

Command line arguments

   Mandatory qualifiers:
  [-sequence]          seqall     Sequence database USA
  [-motif]             string     Patterns for patmatdb are based on the
                                  format of pattern used in the PROSITE
                                  database. For example:
                                  '[DE](2)HS{P}X(2)PX(2,4)C' means two Asps or
                                  Glus in any order followed by His, Ser, any
                                  residue other then Pro, then two of any
                                  residue followed by Pro followed by two to
                                  four of any residue followed by Cys. The
                                  search is case-independent, so 'AAA' matches
                                  'aaa'.
  [-outfile]           outfile    Output file name

   Optional qualifiers: (none)
   Advanced qualifiers: (none)
   General qualifiers:
  -help                bool       report command line options. More
                                  information on associated and general
                                  qualifiers can be found with -help -verbose


Mandatory qualifiers Allowed values Default
[-sequence]
(Parameter 1)
Sequence database USA Readable sequence(s) Required
[-motif]
(Parameter 2)
Patterns for patmatdb are based on the format of pattern used in the PROSITE database. For example: '[DE](2)HS{P}X(2)PX(2,4)C' means two Asps or Glus in any order followed by His, Ser, any residue other then Pro, then two of any residue followed by Pro followed by two to four of any residue followed by Cys. The search is case-independent, so 'AAA' matches 'aaa'. A string of at least 2 characters Required
[-outfile]
(Parameter 3)
Output file name Output file <sequence>.patmatdb
Optional qualifiers Allowed values Default
(none)
Advanced qualifiers Allowed values Default
(none)

Input file format

It reads in one or more protein sequences.

Patterns for patmatdb are based on the format of pattern used in the PROSITE database, with the difference that the terminating dot '.' and the hyphens, '-', between the characters are optional.

The PROSITE pattern definition from the PROSITE documentation follows.

For example, in SWISSPROT entry 100K_RAT you can look for the pattern:

[DE](2)HS{P}X(2)PX(2,4)C

This means: Two Asps or Glus in any order followed by His, Ser, any residue other then Pro, then two of any residue followed by Pro followed by two to four of any residue followed by Cys.

The search is case-independent, so 'AAA' matches 'aaa'.

Output file format

The output from the above example follows:


Number of matches found in HD_FUGRU = 1
Length of the sequence = 3148 basepairs
Length of match = 4
Start of match = position 1037 of sequence
End of match = position 1040 of sequence

patmatDB of HD_FUGRU from 1037 to 1040 using pattern st[ty]s

STPASSTTSSAVDP
     |  |
  1037  1040



Number of matches found in GCN4_YEAST = 1
Length of the sequence = 281 basepairs
Length of match = 4
Start of match = position 157 of sequence
End of match = position 160 of sequence

patmatDB of GCN4_YEAST from 157 to 160 using pattern st[ty]s

SNLEVSTTSFLPTP
     |  |
   157  160

Data files

None.

Notes

None.

References

  1. Bairoch, A., Bucher P. (1994) PROSITE: recent developments. Nucleic Acids Research, Vol 22, No.17 3583-3589.
  2. Bairoch, A., (1992) PROSITE: a dictionary of sites and patterns in proteins. Nucleic Acids Research, Vol 20, Supplement, 2013-2018.
  3. Peek, J., O'Reilly, T., Loukides, M., (1997) Unix Power Tools, 2nd Edition.
  4. Gusfield, D., (1997) Algorithms on strings, Trees and Sequences.
  5. Sedgewick, R,. (1990) Algorithms in C.

Warnings

None.

Diagnostic Error Messages

None.

Exit status

It always exits with status 0

Known bugs

None.

See also

Program nameDescription
antigenicFinds antigenic sites in proteins
digestProtein proteolytic enzyme or reagent cleavage digest
fuzzproProtein pattern search
fuzztranProtein pattern search after translation
helixturnhelixReport nucleic acid binding motifs
oddcompFinds protein sequence regions with a biased composition
patmatmotifsSearch a PROSITE motif database with a protein sequence
pepcoilPredicts coiled coil regions
pregRegular expression search of a protein sequence
pscanScans proteins using PRINTS
sigcleaveReports protein signal cleavage sites
fuzzpro - Protein pattern search
This does much the same is patmatdb, but has a slightly different style of output and allows mismatches.

patmatmotifs - Compares a protein sequence to the PROSITE motif database.
This does the opposite to patmatdb, comparing a database of motifs to one sequence.

Author(s)

This application was written by Sinead O'Leary (soleary@hgmp.mrc.ac.uk)

History

Completed April 30 1999.

Target users

This program is intended to be used by everyone and everything, from naive users to embedded scripts.

Comments