![]() |
EMBOSS: preg |
A regular expression is a way of specifying an ambiguous pattern to search for. Regular expressions are commonly used in some computer programming languages and may be more familiar to some users than to others.
The following is a short guide to regular expressions in EMBOSS:
The following quantifier characters specify the number of time that the character before (in this case 'x') matches:
Quantifiers can follow any of the following types of character specification:
Combining some of these features gives these examples from the PROSITE patterns database:
'[STAGCN][RKH][LIVMAFY]$'
which is the 'Microbodies C-terminal targeting signal'.
'LP.TG[STGAVDE]'
which is the 'Gram-positive cocci surface proteins anchoring hexapeptide'.
Regular expressions are case-sensitive. The pattern 'AAAA' will not match the sequence 'aaaa'.
% preg Input sequence: sw:* Output file [5h1d_fugru.preg]: Regular expression pattern: gc[^g]
Mandatory qualifiers: [-sequence] seqall Sequence database USA [-outfile] outfile Output file name [-pattern] regexp Regular expression pattern Optional qualifiers: (none) Advanced qualifiers: (none) |
Mandatory qualifiers | Allowed values | Default | |
---|---|---|---|
[-sequence] (Parameter 1) |
Sequence database USA | Readable sequence(s) | Required |
[-outfile] (Parameter 2) |
Output file name | Output file | <sequence>.preg |
[-pattern] (Parameter 3) |
Regular expression pattern | Any regular epression pattern is accepted | Required |
Optional qualifiers | Allowed values | Default | |
(none) | |||
Advanced qualifiers | Allowed values | Default | |
(none) |
Matches in CO9_FUGRU CO9_FUGRU 522 GCQ Matches in D1DR_FUGRU D1DR_FUGRU 27 GCF D1DR_FUGRU 345 GCH Matches in D5DR_FUGRU D5DR_FUGRU 43 GCV D5DR_FUGRU 349 GCS Matches in HD_FUGRU HD_FUGRU 982 GCC Matches in SYH_FUGRU SYH_FUGRU 15 GCR Matches in SYV_FUGRU SYV_FUGRU 329 GCD SYV_FUGRU 1128 GCA Matches in TCPD_FUGRU TCPD_FUGRU 291 GCN TCPD_FUGRU 375 GCA Matches in ACH2_DROME ACH2_DROME 4 GCC ACH2_DROME 433 GCN Matches in LACY_ECOLI LACY_ECOLI 147 GCV LACY_ECOLI 175 GCA LACY_ECOLI 332 GCF Matches in BGAL_ECOLI BGAL_ECOLI 121 GCY Matches in 12S1_ARATH 12S1_ARATH 111 GCA Matches in OPSD_HUMAN OPSD_HUMAN 109 GCN Matches in AMIC_PSEAE AMIC_PSEAE 80 GCY Matches in AMIR_PSEAE AMIR_PSEAE 36 GCS
Program name | Description |
---|---|
dreg | regular expression search of a nucleotide sequence |
fuzznuc | Nucleic acid pattern search |
fuzzpro | Protein pattern search |
fuzztran | Protein pattern search after translation |
patmatdb | Search a protein sequence with a motif |
patmatmotifs | Search a PROSITE motif database with a protein sequence |
pscan | Scans proteins using PRINTS |
tfscan | Scans DNA sequences for transcription factors |
Other EMBOSS programs allow you to search for simple patterns and may be easier for the user who has never used regular expressions before: