EMBOSS: wordcount


Program wordcount

Function

Counts words of a specified size in a DNA sequence

Description

Displays all the words of the specified length with the number of
times it occurs.

Usage

Here is a sample session with wordcount.
% wordcount embl:rnu68037 -wordsize=3

Counts words of a specified size in a DNA sequence
ctg     54
gcc     53
tgg     53
ggc     51
cgc     47
gct     47
gtg     40
tgc     39
cct     38
gcg     36
cca     29
ggg     26
ctt     25
tcc     25
cag     25
ggt     24
ccc     24
tgt     23
ctc     23
ccg     22
cac     22
gca     22
cgt     22
agc     21
cgg     19
acg     19
ttg     19
tcg     18
agg     17
ttc     17
cat     17
gag     16
act     16
gtc     16
aac     15
gga     14
tct     14
atc     14
cta     13
tca     13
atg     12
gtt     11
gta     11
acc     11
aca     10
tga     10
caa     10
tac     10
tag     9
gac     9
agt     9
ttt     8
cga     7
taa     6
gat     6
aga     5
tat     5
gaa     4
tta     3
aat     3
ata     3
att     3
aag     2
aaa     1

Command line arguments

   Mandatory qualifiers:
  [-sequence]          sequence   Sequence USA
   -wordsize           integer    Word size
   -outfile            outfile    Output file name

   Optional qualifiers: (none)
   Advanced qualifiers: (none)

Mandatory qualifiers Allowed values Default
[-sequence]
(Parameter 1)
Sequence USA Readable sequence Required
-wordsize Word size Integer 2 or more 4
-outfile Output file name Output file <sequence>.wordcount
Optional qualifiers Allowed values Default
(none)
Advanced qualifiers Allowed values Default
(none)

Input file format

Any sequence USA.

Output file format

Data files

Notes

References

Warnings

Diagnostic Error Messages

Exit status

0 if successfull.

Known bugs

See also

Program nameDescription
chaosCreate a chaos game representation plot for a sequence
chipsCodon usage statistics
codcmpCodon usage table comparison
compseqCounts the composition of dimer/trimer/etc words in a sequence
cuspCreate a codon usage table
freakResidue/base frequency table or plot
geeceeCalculates the fractional GC content of nucleic acid sequences
isochorePlots isochores in large DNA sequences
newcpgreportReport CpG rich areas
newcpgseekReports CpG rich regions
wobbleWobble base plot

Author(s)

This application was written by Ian Longden (il@sanger.ac.uk) Informatics Division, The Sanger Centre, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK.

History

Completed 27th November 1998.

Target users

This program is intended to be used by everyone and everything, from naive users to embedded scripts.

Comments