![]() |
EMBOSS: preg |
A regular expression is a way of specifying an ambiguous pattern to search for. Regular expressions are commonly used in some computer programming languages and may be more familiar to some users than to others.
The following is a short guide to regular expressions in EMBOSS:
The following quantifier characters specify the number of time that the character before (in this case 'x') matches:
Quantifiers can follow any of the following types of character specification:
Combining some of these features gives these examples from the PROSITE patterns database:
'[STAGCN][RKH][LIVMAFY]$'
which is the 'Microbodies C-terminal targeting signal'.
'LP.TG[STGAVDE]'
which is the 'Gram-positive cocci surface proteins anchoring hexapeptide'.
Regular expressions are case-sensitive. The pattern 'AAAA' will not match the sequence 'aaaa'.
% preg regular expression search of a protein sequence Input sequence(s): sw:*_rat Output file [100k_rat.preg]: stdout Regular expression pattern: IA[QWF]A
Mandatory qualifiers: [-sequence] seqall Sequence database USA [-outfile] outfile Output file name [-pattern] regexp Regular expression pattern Optional qualifiers: (none) Advanced qualifiers: (none) General qualifiers: -help bool report command line options. More information on associated and general qualifiers can be found with -help -verbose |
Mandatory qualifiers | Allowed values | Default | |
---|---|---|---|
[-sequence] (Parameter 1) |
Sequence database USA | Readable sequence(s) | Required |
[-outfile] (Parameter 2) |
Output file name | Output file | <sequence>.preg |
[-pattern] (Parameter 3) |
Regular expression pattern | Any regular epression pattern is accepted | Required |
Optional qualifiers | Allowed values | Default | |
(none) | |||
Advanced qualifiers | Allowed values | Default | |
(none) |
preg search of sw:*_rat with pattern IA[QWF]A Matches in 100K_RAT 100K_RAT 390 IAQA Matches in 5H6_RAT 5H6_RAT 289 IAQA Matches in ACDS_RAT ACDS_RAT 282 IAQA Matches in ANX2_RAT ANX2_RAT 70 IAFA Matches in APB3_RAT APB3_RAT 336 IAQA Matches in AQP9_RAT AQP9_RAT 44 IAQA Matches in ATHA_RAT ATHA_RAT 122 IAFA Matches in CD14_RAT CD14_RAT 178 IAQA Matches in CIKE_RAT CIKE_RAT 231 IAFA Matches in CLCB_RAT CLCB_RAT 90 IAQA Matches in CTR1_RAT CTR1_RAT 590 IAFA Matches in CYGF_RAT CYGF_RAT 359 IAQA Matches in DPY2_RAT DPY2_RAT 264 IAQA Matches in ENOB_RAT ENOB_RAT 327 IAQA Matches in ERBP_RAT ERBP_RAT 40 IAFA Matches in GLPK_RAT GLPK_RAT 392 IAFA Matches in GPV_RAT GPV_RAT 529 IAQA Matches in IRKB_RAT IRKB_RAT 93 IAFA Matches in KGP2_RAT KGP2_RAT 477 IAFA Matches in NPX1_RAT NPX1_RAT 407 IAWA Matches in NTDO_RAT NTDO_RAT 160 IAWA Matches in NTSE_RAT NTSE_RAT 180 IAWA Matches in PAX8_RAT PAX8_RAT 188 IAQA Matches in SRA4_RAT SRA4_RAT 491 IAWA Matches in SYNP_RAT SYNP_RAT 43 IAFA Matches in TGN3_RAT TGN3_RAT 330 IAFA Matches in TGR3_RAT TGR3_RAT 792 IAFA Matches in UDB2_RAT UDB2_RAT 325 IAWA Matches in UDB3_RAT UDB3_RAT 325 IAWA Matches in UDB6_RAT UDB6_RAT 325 IAWA Matches in UDBC_RAT UDBC_RAT 325 IAWA Matches in VMT2_RAT VMT2_RAT 462 IAFA
Program name | Description |
---|---|
antigenic | Finds antigenic sites in proteins |
digest | Protein proteolytic enzyme or reagent cleavage digest |
fuzzpro | Protein pattern search |
fuzztran | Protein pattern search after translation |
helixturnhelix | Report nucleic acid binding motifs |
oddcomp | Finds protein sequence regions with a biased composition |
patmatdb | Search a protein sequence with a motif |
patmatmotifs | Search a PROSITE motif database with a protein sequence |
pepcoil | Predicts coiled coil regions |
pscan | Scans proteins using PRINTS |
sigcleave | Reports protein signal cleavage sites |
Other EMBOSS programs allow you to search for simple patterns and may be easier for the user who has never used regular expressions before: