|
EMBOSS: wordcount
|
Program wordcount
Function
Counts words of a specified size in a DNA sequence
Description
Displays all the words of the specified length with the number of
times it occurs.
Usage
Here is a sample session with wordcount.
% wordcount embl:rnu68037 -wordsize=3
Counts words of a specified size in a DNA sequence
ctg 54
gcc 53
tgg 53
ggc 51
cgc 47
gct 47
gtg 40
tgc 39
cct 38
gcg 36
cca 29
ggg 26
ctt 25
tcc 25
cag 25
ggt 24
ccc 24
tgt 23
ctc 23
ccg 22
cac 22
gca 22
cgt 22
agc 21
cgg 19
acg 19
ttg 19
tcg 18
agg 17
ttc 17
cat 17
gag 16
act 16
gtc 16
aac 15
gga 14
tct 14
atc 14
cta 13
tca 13
atg 12
gtt 11
gta 11
acc 11
aca 10
tga 10
caa 10
tac 10
tag 9
gac 9
agt 9
ttt 8
cga 7
taa 6
gat 6
aga 5
tat 5
gaa 4
tta 3
aat 3
ata 3
att 3
aag 2
aaa 1
Command line arguments
Mandatory qualifiers:
[-sequence] sequence Sequence USA
-wordsize integer Word size
-outfile outfile Output file name
Optional qualifiers: (none)
Advanced qualifiers: (none)
|
Mandatory qualifiers |
Allowed values |
Default |
[-sequence] (Parameter 1) |
Sequence USA |
Readable sequence |
Required |
-wordsize |
Word size |
Integer 2 or more |
4 |
-outfile |
Output file name |
Output file |
<sequence>.wordcount |
Optional qualifiers |
Allowed values |
Default |
(none) |
Advanced qualifiers |
Allowed values |
Default |
(none) |
Input file format
Any sequence USA.
Output file format
Data files
Notes
References
Warnings
Diagnostic Error Messages
Exit status
0 if successfull.
Known bugs
See also
Program name | Description |
chaos | Create a chaos game representation plot for a sequence |
chips | Codon usage statistics |
codcmp | Codon usage table comparison |
compseq | Counts the composition of dimer/trimer/etc words in a sequence |
cusp | Create a codon usage table |
freak | Residue/base frequency table or plot |
geecee | Calculates the fractional GC content of nucleic acid sequences |
isochore | Plots isochores in large DNA sequences |
newcpgreport | Report CpG rich areas |
newcpgseek | Reports CpG rich regions |
wobble | Wobble base plot |
Author(s)
This application was written by Ian Longden (il@sanger.ac.uk) Informatics
Division, The Sanger Centre, Wellcome Trust Genome Campus, Hinxton,
Cambridge, CB10 1SA, UK.
History
Completed 27th November 1998.
Target users
This program is intended to be used by everyone and everything,
from naive users to embedded scripts.
Comments