![]() |
EMBOSS: showseq |
The output is sent to the screen by default for the user to view, but it can write the results to a file.
The display style of the output is very flexible. The user can select a style from the pre-set choice of formats, or can design a style to suit their purposes and aesthetic tastes.
The displayed sequence can be numbered either by numbering the start and ending positions, or by placing a ruler with ticks above or below the sequence.
The width of a line can be set. The width of a margin around the sequence reserved for numbering can be set. The initial position to start numbering from can be set.
The sequence can be translated, using the selectable genetic code tables. The translation can be done in one, three or six frames. The translation can be displayed in one-letter or three-letter amino acid codes. The translation can optionally be displayed only when it is in open reading frames (ORFs) of a specified minimum size. One or more specified regions of the sequence can be individually translated.
Specified regions of the sequence can be displayed in uppercase to highlight them.
The output can be formatted for HTML.
If the output is being formatted for HTML, then specified regions of the sequence can be displayed in any valid HTML colours.
The Restriction Enzyme database (REBASE) is a collection of information about restriction enzymes and related proteins. It contains published and unpublished references, recognition and cleavage sites, isoschizomers, commercial availability, methylation sensitivity, crystal and sequence data. DNA methyltransferases, homing endonucleases, nicking enzymes, specificity subunits and control proteins are also included. Most recently, putative DNA methyltransferases and restriction enzymes, as predicted from analysis of genomic sequences, are also listed.
The home page of REBASE is: http://rebase.neb.com/rebase/rebase.html
This program can use REBASE data to find the recognition sites and/or cut sites of restriction enzymes in a nucleic acid sequence.
This program can display the cut sites on both strands.
One potentially very useful option is '-flatreformat' that displays not only the cut sites which many other restriction cut-site programs will show, but also shows the recognition site.
If the sequence is in EMBL, Genbank or SwissProt format, the feature table of the sequence can be dispalyed with the sequence. GFF file features can also be displayed if they are included on the command line with the -ufo=file qualifier.
% showseq embl:eclac -sbeg 1 -send 100 Display a sequence with features, translation etc.. Output file [stdout]: Things to display 0 : Enter your own list of things to display 1 : Sequence only 2 : Default sequence with features 3 : Pretty sequence 4 : One frame translation 5 : Three frame translations 6 : Six frame translations 7 : Restriction enzyme map 8 : Baroque Display format [2]: ECLAC E.coli lactose operon with lacI, lacZ, lacY and lacA genes. 10 20 30 40 50 60 ----:----|----:----|----:----|----:----|----:----|----:----| gacaccatcgaatggcgcaaaacctttcgcggtatggcatgatagcgcccggaagagagt | variation note="c in wild-type; t in 'up' promoter mutant I-Q [11]" |========= mRNA note="lacI (repressor) mRNA; preferred in vivo 3' end [12],[29]" 70 80 90 100 110 120 ----:----|----:----|----:----|----:----|----:----|----:----| caattcagggtggtgaatgtgaaaccagtaacgttatacgatgtcgcagagtatgccggt ============================================================ mRNA note="lacI (repressor) mRNA; preferred in vivo 3' end [12],[29]" |========================================= CDS codon_start="1" db_xref="SWISS-PROT:P03023" note="lac repressor p
Note that although we asked for the sequence display to end at position '100', it has displayed the sequence up to the end of the line - position '120'. This is a feature of this program to make the display of things like restriction enzyme cutting sites easier.
The standard list of output formats are only a small selection of the possible ways in which a sequence might be displayed. Precise control over the output format is acheived by selecting the qualifier '-format 0' (Option 0 in the list of things to display). For example:
% showseq embl:eclac -sbeg 1 -send 120 Display a sequence with features, translation etc.. Output file [stdout]: Things to display 0 : Enter your own list of things to display 1 : Sequence only 2 : Default sequence with features 3 : Pretty sequence 4 : One frame translation 5 : Three frame translations 6 : Six frame translations 7 : Restriction enzyme map 8 : Baroque Display format [2]: 0 Specify your own things to display S : Sequence B : Blank line 1 : Frame1 translation 2 : Frame2 translation 3 : Frame3 translation -1 : CompFrame1 translation -2 : CompFrame2 translation -3 : CompFrame3 translation T : Ticks line N : Number ticks line C : Complement sequence F : Features R : Restriction enzyme cut sites in forward sense -R : Restriction enzyme cut sites in reverse sense Enter a list of things to display [B N T S F]: b,s,t,c ECLAC E.coli lactose operon with lacI, lacZ, lacY and lacA genes. gacaccatcgaatggcgcaaaacctttcgcggtatggcatgatagcgcccggaagagagt ----:----|----:----|----:----|----:----|----:----|----:----| ctgtggtagcttaccgcgttttggaaagcgccataccgtactatcgcgggccttctctca caattcagggtggtgaatgtgaaaccagtaacgttatacgatgtcgcagagtatgccggt ----:----|----:----|----:----|----:----|----:----|----:----| gttaagtcccaccacttacactttggtcattgcaatatgctacagcgtctcatacggcca
By choosing format '0' and then specifying that we want to display the things: 'b,s,t,c', we will output the sequence in the following way:
For every new line that the sequence starts to write, the output display will contain first a blank line ('b'), then the sequence itself ('s') then a line of with ticks every 10 characters ('t') then the reverse complement of the sequence ('c'). Subsequent lines of the sequence output will repeat this format.
The 'thing' codes used in the list of standard formats are:
Sequence only: S Default sequence: B N T S F Pretty sequence: B N T S One frame translation: B N T S B 1 F Three frame translations: B N T S B 1 2 3 F Six frame translations: B N T S B 1 2 3 T -3 -2 -1 F Restriction enzyme map: B R S N T C -R B 1 2 3 T -3 -2 -1 Baroque: B 1 2 3 N T R S T C -R T -3 -2 -1 F
The following are some examples of different formats:
Just sequence: % showseq embl:eclac stdout -sbeg 1 -send 120 -noname -nodesc -format 0 -thing S Display a sequence with features, translation etc.. gacaccatcgaatggcgcaaaacctttcgcggtatggcatgatagcgcccggaagagagt caattcagggtggtgaatgtgaaaccagtaacgttatacgatgtcgcagagtatgccggt
Protein sequence displayed in three-letter codes. (The codes are displayed downwards, so the first code is 'Met'):
% showseq sw:rs24_fugru stdout -three -format 2 RS24_FUGRU 40S RIBOSOMAL PROTEIN S24. 10 20 30 40 50 60 ----:----|----:----|----:----|----:----|----:----|----:----| MAATVTVATALPMTAALLGALGMVVAVLHPGLATVPLTGIAGLLALMTLTTPAVVPVPGP esshaharhryhehsreelryleaasaeirlylharyhllrlyelyeyyhhrsaahahlh tnprlrlgrgsetrnguungsntllplusoysarlosruegusuastrsrroplleleye 70 80 90 100 110 120 ----:----|----:----|----:----|----:----|----:----|----:----| ATGPGGGLTTGPAMVTASLATALLAGPLHALAAHGLPGLLLTSALGALGALAAMLLVAGT rhlhlllyhhlhleayseesylyyslryirelrilehlyyyherylrylrysreyyarlh grneyyysrryeatlrpruprassnuossguagsyueusssrrgsngsugsngtsslgyr 130 140 150 160 170 180 ----:----|----:----|----:----|----:----|----:----|----:----| LLASVGASLLLA yylealleyyys ssarlyarsssp
Number the sequence lines in the margin:
% showseq embl:mmam stdout -format 1 -number Display a sequence with features, translation etc.. Output file [stdout]: MMAM Mus musculus (cell line C3H/F2-11) chromosome 12 anti-DNA antibody heavy chain mRNA. 1 gagnnccagctgcagcagtctggacctgagctggtaaagcctggggcttcagtgaagatg 60 61 tcctgcaaggcttctggatacacattcactagctatgttatgcactgggtgaatcagaag 120 121 cctgggcagggccttgagtggattggatatattaatccttacaatgatggtactaactac 180 181 aatgagaagttcaaaggcaaggccacactgacttcagacaaatcctccagcacagcctac 240 241 atggagttcagcagcctgacctctgaggactctgcggtctattactgtgcaagaaaaact 300 301 tcctactatagtaacctatattactttgactactggggccaaggcaccactctcacagtc 360 361 tcctca 366
Start the numbering at a specified value ('123' in this case):
% showseq embl:mmam stdout -format 1 -number -offset 123 Display a sequence with features, translation etc.. MMAM Mus musculus (cell line C3H/F2-11) chromosome 12 anti-DNA antibody heavy chain mRNA. 123 gagnnccagctgcagcagtctggacctgagctggtaaagcctggggcttcagtgaagatg 182 183 tcctgcaaggcttctggatacacattcactagctatgttatgcactgggtgaatcagaag 242 243 cctgggcagggccttgagtggattggatatattaatccttacaatgatggtactaactac 302 303 aatgagaagttcaaaggcaaggccacactgacttcagacaaatcctccagcacagcctac 362 363 atggagttcagcagcctgacctctgaggactctgcggtctattactgtgcaagaaaaact 422 423 tcctactatagtaacctatattactttgactactggggccaaggcaccactctcacagtc 482 483 tcctca 488
Make selected regions uppercase. (Use '-slower' to force the rest of the sequence to be lowercase).
% showseq embl:mmam stdout -format 1 -slower -upper '25-45,101-203,333-362' Display a sequence with features, translation etc.. MMAM Mus musculus (cell line C3H/F2-11) chromosome 12 anti-DNA antibody heavy chain mRNA. gagnnccagctgcagcagtctggaCCTGAGCTGGTAAAGCCTGGGgcttcagtgaagatg tcctgcaaggcttctggatacacattcactagctatgttaTGCACTGGGTGAATCAGAAG CCTGGGCAGGGCCTTGAGTGGATTGGATATATTAATCCTTACAATGATGGTACTAACTAC AATGAGAAGTTCAAAGGCAAGGCcacactgacttcagacaaatcctccagcacagcctac atggagttcagcagcctgacctctgaggactctgcggtctattactgtgcaagaaaaact tcctactatagtaacctatattactttgactaCTGGGGCCAAGGCACCACTCTCACAGTC TCctca
Translate selected regions:
% showseq embl:mmam stdout -format 4 -send 120 -trans 25-49,66-76 Display a sequence with features, translation etc.. MMAM Mus musculus (cell line C3H/F2-11) chromosome 12 anti-DNA antibody heavy chain mRNA. 10 20 30 40 50 60 ----:----|----:----|----:----|----:----|----:----|----:----| gagnnccagctgcagcagtctggacctgagctggtaaagcctggggcttcagtgaagatg P E L V K P G A S 70 80 90 100 110 120 ----:----|----:----|----:----|----:----|----:----|----:----| tcctgcaaggcttctggatacacattcactagctatgttatgcactgggtgaatcagaag R L L
Mandatory qualifiers (* if not always prompted): [-sequence] seqall Sequence database USA [-outfile] outfile If you enter the name of a file here then this program will write the sequence details into that file. -format list Display format * -things list Specify a list of one or more code characters in the order in which you wish things to be displayed one above the other down the page. For example if you wish to see things displayed in the order: sequence, complement sequence, ticks line, frame 1 translation, blank line; then you should enter 'S,C,T,1,B'. Optional qualifiers: -translate range Regions to translate (if translating). If this is left blank the complete sequence is translated. A set of regions is specified by a set of pairs of positions. The positions are integers. They are separated by any non-digit, non-alpha character. Examples of region specifications are: 24-45, 56-78 1:45, 67=99;765..888 1,5,8,10,23,45,57,99 -uppercase range Regions to put in uppercase. If this is left blank, then the sequence case is left alone. A set of regions is specified by a set of pairs of positions. The positions are integers. They are separated by any non-digit, non-alpha character. Examples of region specifications are: 24-45, 56-78 1:45, 67=99;765..888 1,5,8,10,23,45,57,99 -highlight range Regions to colour if formatting for HTML. If this is left blank, then the sequence is left alone. A set of regions is specified by a set of pairs of positions. The positions are integers. They are followed by any valid HTML font colour. Examples of region specifications are: 24-45 blue 56-78 orange 1-100 green 120-156 red A file of ranges to colour (one range per line) can be specified as '@filename'. -enzymes string The name 'all' reads in all enzyme names from the REBASE database. You can specify enzymes by giving their names with commas between then, such as: 'HincII,hinfI,ppiI,hindiii'. The case of the names is not important. You can specify a file of enzyme names to read in by giving the name of the file holding the enzyme names with a '@' character in front of it, for example, '@enz.list'. Blank lines and lines starting with a hash character or '!' are ignored and all other lines are concatenated together with a comma character ',' and then treated as the list of enzymes to search for. An example of a file of enzyme names is: ! my enzymes HincII, ppiII ! other enzymes hindiii HinfI PpiI Advanced qualifiers: -orfminsize integer Minimum size of Open Reading Frames (ORFs) to display in the translations. -flatreformat bool Display RE sites in flat format -mincuts integer Minimum cuts per RE -maxcuts integer Maximum cuts per RE -sitelen integer Minimum recognition site length -single bool Force single RE site only cuts -[no]blunt bool Allow blunt end RE cutters -[no]sticky bool Allow sticky end RE cutters -[no]ambiguity bool Allow ambiguous RE matches -plasmid bool Allow circular DNA -[no]commercial bool Only use restriction enzymes with suppliers -[no]limit bool Limits RE hits to one isoschizomer -preferred bool Report preferred isoschizomers -table list Translation table genetic code to use -threeletter bool Display protein sequences in three-letter code -number bool Number the sequences -width integer Width of sequence to display -length integer Line length of page (0 for indefinite) -margin integer Margin around sequence for numbering -[no]name bool Set this to be false if you do not wish to display the ID name of the sequence -[no]description bool Set this to be false if you do not wish to display the description of the sequence -offset integer Offset to start numbering the sequence from -html bool Use HTML formatting |
Mandatory qualifiers | Allowed values | Default | |||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
[-sequence] (Parameter 1) |
Sequence database USA | Readable sequence(s) | Required | ||||||||||||||||||||||||||||
[-outfile] (Parameter 2) |
If you enter the name of a file here then this program will write the sequence details into that file. | Output file | stdout | ||||||||||||||||||||||||||||
-format | Display format |
|
2 | ||||||||||||||||||||||||||||
-things | Specify a list of one or more code characters in the order in which you wish things to be displayed one above the other down the page. For example if you wish to see things displayed in the order: sequence, complement sequence, ticks line, frame 1 translation, blank line; then you should enter 'S,C,T,1,B'. |
|
B N T S F | ||||||||||||||||||||||||||||
Optional qualifiers | Allowed values | Default | |||||||||||||||||||||||||||||
-translate | Regions to translate (if translating). If this is left blank the complete sequence is translated. A set of regions is specified by a set of pairs of positions. The positions are integers. They are separated by any non-digit, non-alpha character. Examples of region specifications are: 24-45, 56-78 1:45, 67=99;765..888 1,5,8,10,23,45,57,99 | Sequence range | If this is left blank the complete sequence is translated. | ||||||||||||||||||||||||||||
-uppercase | Regions to put in uppercase. If this is left blank, then the sequence case is left alone. A set of regions is specified by a set of pairs of positions. The positions are integers. They are separated by any non-digit, non-alpha character. Examples of region specifications are: 24-45, 56-78 1:45, 67=99;765..888 1,5,8,10,23,45,57,99 | Sequence range | If this is left blank, then the sequence case is left alone. | ||||||||||||||||||||||||||||
-highlight | Regions to colour if formatting for HTML. If this is left blank, then the sequence is left alone. A set of regions is specified by a set of pairs of positions. The positions are integers. They are followed by any valid HTML font colour. Examples of region specifications are: 24-45 blue 56-78 orange 1-100 green 120-156 red A file of ranges to colour (one range per line) can be specified as '@filename'. | Sequence range | full sequence | ||||||||||||||||||||||||||||
-enzymes | The name 'all' reads in all enzyme names from the REBASE database. You can specify enzymes by giving their names with commas between then, such as: 'HincII,hinfI,ppiI,hindiii'. The case of the names is not important. You can specify a file of enzyme names to read in by giving the name of the file holding the enzyme names with a '@' character in front of it, for example, '@enz.list'. Blank lines and lines starting with a hash character or '!' are ignored and all other lines are concatenated together with a comma character ',' and then treated as the list of enzymes to search for. An example of a file of enzyme names is: ! my enzymes HincII, ppiII ! other enzymes hindiii HinfI PpiI | Any string is accepted | all | ||||||||||||||||||||||||||||
Advanced qualifiers | Allowed values | Default | |||||||||||||||||||||||||||||
-orfminsize | Minimum size of Open Reading Frames (ORFs) to display in the translations. | Integer 0 or more | 0 | ||||||||||||||||||||||||||||
-flatreformat | Display RE sites in flat format | Yes/No | No | ||||||||||||||||||||||||||||
-mincuts | Minimum cuts per RE | Integer from 1 to 1000 | 1 | ||||||||||||||||||||||||||||
-maxcuts | Maximum cuts per RE | Integer up to 2000000000 | 2000000000 | ||||||||||||||||||||||||||||
-sitelen | Minimum recognition site length | Integer from 2 to 20 | 4 | ||||||||||||||||||||||||||||
-single | Force single RE site only cuts | Yes/No | No | ||||||||||||||||||||||||||||
-[no]blunt | Allow blunt end RE cutters | Yes/No | Yes | ||||||||||||||||||||||||||||
-[no]sticky | Allow sticky end RE cutters | Yes/No | Yes | ||||||||||||||||||||||||||||
-[no]ambiguity | Allow ambiguous RE matches | Yes/No | Yes | ||||||||||||||||||||||||||||
-plasmid | Allow circular DNA | Yes/No | No | ||||||||||||||||||||||||||||
-[no]commercial | Only use restriction enzymes with suppliers | Yes/No | Yes | ||||||||||||||||||||||||||||
-[no]limit | Limits RE hits to one isoschizomer | Yes/No | Yes | ||||||||||||||||||||||||||||
-preferred | Report preferred isoschizomers | Yes/No | No | ||||||||||||||||||||||||||||
-table | Translation table genetic code to use |
|
0 | ||||||||||||||||||||||||||||
-threeletter | Display protein sequences in three-letter code | Yes/No | No | ||||||||||||||||||||||||||||
-number | Number the sequences | Yes/No | No | ||||||||||||||||||||||||||||
-width | Width of sequence to display | Integer 1 or more | 60 | ||||||||||||||||||||||||||||
-length | Line length of page (0 for indefinite) | Integer 0 or more | 0 | ||||||||||||||||||||||||||||
-margin | Margin around sequence for numbering | Integer 0 or more | 10 | ||||||||||||||||||||||||||||
-[no]name | Set this to be false if you do not wish to display the ID name of the sequence | Yes/No | Yes | ||||||||||||||||||||||||||||
-[no]description | Set this to be false if you do not wish to display the description of the sequence | Yes/No | Yes | ||||||||||||||||||||||||||||
-offset | Offset to start numbering the sequence from | Any integer value | 1 | ||||||||||||||||||||||||||||
-html | Use HTML formatting | Yes/No | No |
The format of the range file is:
An example range file is:
# this is my set of ranges 12 23 4 5 this is like 12-23, but smaller 67 10348 interesting region
You can specifiy a file of ranges to highlight in a different colour when outputting in HTML format (using the '-html' qualifier) by giving the '-highlight' qualifier the value '@' followed by the name of the file containing the ranges. (eg: '-highlight @myfile').
The format of this file is very similar to the format of the above
uppercase range file, except that the text after the start and end
positions is used as the HTML colour name. This colour name is used 'as
is' when specifying the colour in HTML in a ''
construct, (where 'xxx' is the name of the colour).
The standard names of HTML font colours are given in:
http://www.iconbazaar.com/color_tables/named_colors.html
An example highlight range file is:
You can specify a file of enzyme names to read in by giving the
'-enzymes' qualifier the name of the file holding the enzyme names with
a '@' character in front of it, for example, '@enz.list'.
Blank lines and lines starting with a '#' or '!' character are ignored
and all other lines are concatenated together with a comma character ','
and then treated as the list of enzymes to search for.
An example of a file of enzyme names is:
The output format is extremely variable and under the control of the
qualifiers used.
The sequence can be formatted for HTML display by using the '-html'
qualifier. The top and tail html tags <HEAD>, <BODY> etc. are not
included as it is expected that the output of this program will be
included in a more extensive HTML page and so these parts are left to
the user to provide.
The name of the sequence is displayed, followed by the description of
the sequence. These can be turned off with the '-noname' and
'-nodescription' qualifiers.
Then the sequence is output, one line at a time. Any associated
information to be displayed is also output above and below the sequence
line, as specified by the '-format' and or '-things' qualifiers. (See
the 'Description' section for detals).
The margins around the sequence are specified by the use of the
'-margin' qaulifier and any numbering of the sequence and its
translations are placed in the margin.
A display of the restriction enzyme cut sites can be selected via
'-format 6' option or the '-format 0 -thing b,r,s,-r' style of options.
The option '-format 7' will produce a formatted display of cut sites on
the sequence, with the six-frame translation below it. The cut sites
are indicated by a slash character '\' that points to the poition
between the nucleotides where the cuts occur. Cuts by many enzymes at
the same position are indicated by stacking the enzyme names on top of
each other.
At the end the section header 'Enzymes that cut' is displayed followed
by a list of the enzymes that cut the specified sequence and the number
of times that they cut.
The '-flatreformat' qualifier changes the display to emphasise the
recognition site of the restriction enzyme, which is indicated by a row
of '=' characters. The cut site if pointed to by a '>' or '<' character
and if the cut site is not within or imemdiately adjacent to the
recognition site, they are linked by a row or '.' characters.
The name of the enzyme is displayed above (or below when the reverse
sense site if displayed) the recognition site. The name of the enzyme
is also displayed above the cut site if this occurs on a different
display line to the recognition site (i.e. if it wraps onto the next
line of sequence).
An example of this display follows:
Users can provide their own data files in their own directories.
Project specific files can be put in the current directory, or for
tidier directory listings in a subdirectory called ".embossdata". Files
for all EMBOSS runs can be put in the user's home directory, or again in
a subdirectory called ".embossdata".
The directories are searched in the following order:
The Genetic Code data files are based on the NCBI genetic code tables.
Their names and descriptions are:
The format of these files is very simple.
It consists of several lines of optional comments, each starting with a
'#' character.
These are followed the line: 'Genetic Code [n]', where 'n' is the number
of the genetic code file.
This is followed by the description of the code and then by four lines
giving the IUPAC one-letter code of the translated amino acid, the start
codons (indicdated by an 'M') and the three bases of the codon, lined up
one on top of the other.
For example:
showseq uses the EMBOSS REBASE data files in 'data/REBASE/*' under the
EMBOSS installation directory.
These files must first be set up using the program 'rebaseextract'.
Running 'rebaseextract' may be the job of your system manager.
If you ask for the sequence display to end at position '100', with the
qualifier '-send 100', it will display the sequence up to the end of the
line - position '120'. This is a feature of this program to make the
display of things like restriction enzyme cutting sites easier.
It is not a bug. Please don't report it.
# this is my set of ranges
12 23 red
4 5 darkturquoise
67 10348 #FFE4E1
# my enzymes
HincII, ppiI
# other enzymes
hindiii
HinfI
Output file format
Most of the variants of the output format have already been described in
the 'Description' and 'Usage' sections, but here is some more just to
fill out this section ;-)
% showseq embl:eclac stdout -send 60 -format 6 -enz TaqI,Hin6I,AciI,Hin6I,BssKI,Bsu6
Display a sequence with features, translation etc..
ECLAC
E.coli lactose operon with lacI, lacZ, lacY and lacA genes.
Hin6I
| BssKI
TaqI Hin6I AciI | Bsu6I
\ \ \ \ \
GACACCATCGAATGGCGCAAAACCTTTCGCGGTATGGCATGATAGCGCCCGGAAGAGAGT
10 20 30 40 50 60
----:----|----:----|----:----|----:----|----:----|----:----|
CTGTGGTAGCTTACCGCGTTTTGGAAAGCGCCATACCGTACTATCGCGGGCCTTCTCTCA
/ / / / / /
TaqI Hin6I AciI | | BssKI
| Bsu6I
Hin6I
D T I E W R K T F R G M A * * R P E E S
T P S N G A K P F A V W H D S A R K R V
H H R M A Q N L S R Y G M I A P G R E S
----:----|----:----|----:----|----:----|----:----|----:----|
V G D F P A F G K A T H C S L A R F L T
S V M S H R L V K R P I A H Y R G S S L
X C W R I A C F R E R Y P M I A G P L S
|
variation note="c in wild-type; t in 'up' promoter mutant I-Q [11]"
|=========
mRNA note="lacI (repressor) mRNA; preferred in vivo 3' end [12],[29]"
Data files
EMBOSS data files are distributed with the application and stored in the
standard EMBOSS data directory, which is defined by EMBOSS environment
variable EMBOSS_DATA.
. (your current directory)
.embossdata (under your current directory)
~/ (your home directory)
~/.embossdata
EGC.0 Standard (Differs from GC.1 in that it only has initiation site 'AUG')
EGC.1 Standard
EGC.2 Vertebrate Mitochondrial
EGC.3 Yeast Mitochondrial
EGC.4 Mold, Protozoan, Coelenterate Mitochondrial and Mycoplasma/Spiroplasma
EGC.5 Invertebrate Mitochondrial
EGC.6 Ciliate Macronuclear and Dasycladacean
EGC.9 Echinoderm Mitochondrial
EGC.10 Euplotid Nuclear
EGC.11 Bacterial
EGC.12 Alternative Yeast Nuclear
EGC.13 Ascidian Mitochondrial
EGC.14 Flatworm Mitochondrial
EGC.15 Blepharisma Macronuclear
# Genetic Code Table
#
# Obtained from: http://www.ncbi.nlm.nih.gov/collab/FT/genetic_codes.html
# and: http://www3.ncbi.nlm.nih.gov/htbin-post/Taxonomy/wprintgc?mode=c
#
# Differs from Genetic Code [1] only in that the initiation sites have been
# changed to only 'AUG'
Genetic Code [0]
Standard
AAs = FFLLSSSSYY**CC*WLLLLPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG
Starts = -----------------------------------M----------------------------
Base1 = TTTTTTTTTTTTTTTTCCCCCCCCCCCCCCCCAAAAAAAAAAAAAAAAGGGGGGGGGGGGGGGG
Base2 = TTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGG
Base3 = TCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAG
Notes
None.
References
None.
Warnings
None.
Diagnostic Error Messages
None.
Exit status
It always exits with status 0.
Known bugs
None known.
Program name Description antigenic Finds antigenic sites in proteins backtranseq Back translate a protein sequence chaos Create a chaos game representation plot for a sequence cirdna Draws circular maps of DNA constructs cpgplot Plot CpG rich areas cpgreport Reports all CpG rich regions cusp Create a codon usage table diffseq Find differences (SNPs) between nearly identical sequences dotmatcher Displays a thresholded dotplot of two sequences dotpath Displays a non-overlapping wordmatch dotplot of two sequences dottup Displays a wordmatch dotplot of two sequences einverted Finds DNA inverted repeats equicktandem Finds tandem repeats etandem Looks for tandem repeats in a nucleotide sequence garnier Predicts protein secondary structure getorf Finds and extracts open reading frames (ORFs) helixturnhelix Report nucleic acid binding motifs isochore Plots isochores in large DNA sequences lindna Draws linear maps of DNA constructs newcpgreport Report CpG rich areas newcpgseek Reports CpG rich regions oddcomp Finds protein sequence regions with a biased composition palindrome Looks for inverted repeats in a nucleotide sequence pepcoil Predicts coiled coil regions plotorf Plot potential open reading frames polydot Displays all-against-all dotplots of a set of sequences prettyseq Output sequence with translated ranges pscan Scans proteins using PRINTS rebaseextract Extract data from REBASE redata Search REBASE for enzyme name, references, suppliers etc remap Display a sequence with restriction cut sites, translation etc restrict Finds restriction enzyme cleavage sites seqret Reads and writes (returns) sequences seqretall Reads and writes (returns) a set of sequences one at a time seqretset Reads and writes (returns) a set of sequences all at once seqretsplit Reads and writes (returns) sequences showfeat Show features of a sequence showorf Pretty output of DNA translations sigcleave Reports protein signal cleavage sites silent Silent mutation restriction enzyme scan tfscan Scans DNA sequences for transcription factors tmap Displays membrane spanning regions transeq Translate nucleic acid sequences
Author(s)
This application was written by Gary Williams (gwilliam@hgmp.mrc.ac.uk)
History
Written 1999 - GWW
23 Aug 2000 - features display added - GWW
Target users
This program is intended to be used by everyone and everything,
from naive users to embedded scripts.
Comments