![]() |
EMBOSS: chips |
The Nc statistic has problems in very short sequences (20 amino acids or less) which are yet to be fully resolved. They are caused by the need to consider amino acids which are missing in the sequence.
This calculation was originally in the EGCG package as "codfish" (codon usage for fission yeast). As Frank Wright is a vegan, we looked for a meat-free name for the EMBOSS version, "chips". The official explanation is "Codon Heterozygosity (Inverse of) in a Protein-coding Sequence"
% chips -sbeg 135 -send 1292 Input sequence: embl:paamir Output file [paamir.chips]:
Mandatory qualifiers: [-seqall] seqall Sequence database USA [-outfile] outfile Output file name Optional qualifiers: (none) Advanced qualifiers: -cfile codon Codon usage file -window integer Averaging window |
Mandatory qualifiers | Allowed values | Default | |
---|---|---|---|
[-seqall] (Parameter 1) |
Sequence database USA | Readable sequence(s) | Required |
[-outfile] (Parameter 2) |
Output file name | Output file | <sequence>.chips |
Optional qualifiers | Allowed values | Default | |
(none) | |||
Advanced qualifiers | Allowed values | Default | |
-cfile | Codon usage file | Codon usage file in EMBOSS data path | Ehum.cut |
-window | Averaging window | Any integer value | 30 |
# CHIPS codon usage statistics Nc = 32.951If all codons are used, the Nc value will be 61. If only one codon is used for each amino acid the Nc value will be 20. Low values therefor indicate a strong codon bias, and high values indicate a low bias and possibly a non-coding region.
The codon usage table is by default the file "CODONS/Ehum.cut" in the EMBOSS distribution directory.
EMBOSS data files are distributed with the application and stored in the standard EMBOSS data directory, which is defined by EMBOSS environment variable EMBOSS_DATA.
Users can provide their own data files in their own directories. Project specific files can be put in the current directory, or for tidier directory listings in a subdirectory called ".embossdata". Files for all EMBOSS runs can be put in the user's home directory, or again in a subdirectory called ".embossdata".
The directories are searched in the following order:
Program name | Description |
---|---|
chaos | Create a chaos game representation plot for a sequence |
codcmp | Codon usage table comparison |
compseq | Counts the composition of dimer/trimer/etc words in a sequence |
cusp | Create a codon usage table |
freak | Residue/base frequency table or plot |
geecee | Calculates the fractional GC content of nucleic acid sequences |
getorf | Finds and extracts open reading frames (ORFs) |
isochore | Plots isochores in large DNA sequences |
newcpgreport | Report CpG rich areas |
newcpgseek | Reports CpG rich regions |
syco | Synonymous codon usage Gribskov statistic plot |
wobble | Wobble base plot |
wordcount | Counts words of a specified size in a DNA sequence |