![]() |
EMBOSS: maskseq |
Why would you wish to do this? It is common for database searches to mask out low-complexity or biased composition regions of a sequence so that spurious matches do not occur. It is just possible that you have a program that has reported such biased regions but which has not masked the sequence itself. In that case, you can use this program to do the masking.
You may find other uses for it.
% maskseq gatta.seq gatta2.seq -reg=10-12
To mask off bases 20 to 30 from a sequence 'hdh.seq' using the character 'n' and write to the new sequence file 'hdh2.seq':
% maskseq hdh.seq hdh2.seq -reg=20-30 -mask=n
To mask off the regions 20 to 23, 34 to 45 and 88 to 90 in 'yuy.seq':
% maskseq yuy.seq yuy2.seq -reg=20-23,34-45,88-90
Mandatory qualifiers: [-sequence] sequence Sequence USA [-outseq] seqout Output sequence USA -regions range Regions to mask. A set of regions is specified by a set of pairs of positions. The positions are integers. They are separated by any non-digit, non-alpha character. Examples of region specifications are: 24-45, 56-78 1:45, 67=99;765..888 1,5,8,10,23,45,57,99 Optional qualifiers: -maskchar string Character to use when masking. Default is 'X' for protein sequences, 'N' for nucleic sequences. Advanced qualifiers: (none) |
Mandatory qualifiers | Allowed values | Default | |
---|---|---|---|
[-sequence] (Parameter 1) |
Sequence USA | Readable sequence | Required |
[-outseq] (Parameter 2) |
Output sequence USA | Writeable sequence | <sequence>.format |
-regions | Regions to mask. A set of regions is specified by a set of pairs of positions. The positions are integers. They are separated by any non-digit, non-alpha character. Examples of region specifications are: 24-45, 56-78 1:45, 67=99;765..888 1,5,8,10,23,45,57,99 | Sequence range | None |
Optional qualifiers | Allowed values | Default | |
-maskchar | Character to use when masking. Default is 'X' for protein sequences, 'N' for nucleic sequences. | Any string is accepted | 'X' for protein, 'N' for nucleic |
Advanced qualifiers | Allowed values | Default | |
(none) |
You can specifiy a file of ranges to mask out by giving the '-regions' qualifier the value '@' followed by the name of the file containing the ranges. (eg: '-regions @myfile').
The format of the range file is:
An example range file is:
# this is my set of ranges 12 23 4 5 this is like 12-23, but smaller 67 10348 interesting region
None.
See also
Program name | Description |
---|---|
cutseq | Removes a specified section from a sequence |
descseq | Alter the name or description of a sequence |
extractseq | Extract regions from a sequence |
maskfeat | Mask off features of a sequence |
megamerger | Merge two large overlapping nucleic acid sequences |
merger | Merge two overlapping nucleic acid sequences |
newseq | Type in a short new sequence |
noreturn | Removes carriage return from ASCII files |
nthseq | Writes one sequence from a multiple set of sequences |
pasteseq | Insert one sequence into another |
revseq | Reverse and complement a sequence |
splitter | Split a sequence into (overlapping) smaller sequences |
trimseq | Trim ambiguous bits off the ends of sequences |
vectorstrip | Strips out DNA between a pair of vector sequences |