EMBOSS: splitter


Program splitter

Function

Split a sequence into (overlapping) smaller sequences

Description

This simple editing program allows you to split a long sequence into smaller, optionally overlapping, subsequences.

Usage

To split a sequence into sub-sequences of 10,000 bases (the default size) with no overlap between the sub-sequences:
	% splitter one_huge.seq many_small.seq
To split a sequence into sub-sequences of 50,000 bases with an overlap of 3,000 bases on each sub-sequence.
	% splitter one_huge.seq many_small.seq -size=50000 -over=3000

Command line arguments

   Mandatory qualifiers:
  [-sequence]          seqall     Sequence database USA
  [-outseq]            seqoutall  Output sequence(s) USA

   Optional qualifiers:
   -size               integer    Size to split at
   -overlap            integer    Overlap between split sequences

   Advanced qualifiers: (none)
   General qualifiers:
  -help                bool       report command line options. More
                                  information on associated and general
                                  qualifiers can be found with -help -verbose


Mandatory qualifiers Allowed values Default
[-sequence]
(Parameter 1)
Sequence database USA Readable sequence(s) Required
[-outseq]
(Parameter 2)
Output sequence(s) USA Writeable sequence(s) <sequence>.format
Optional qualifiers Allowed values Default
-size Size to split at Integer 1 or more 10000
-overlap Overlap between split sequences Integer 0 or more 0
Advanced qualifiers Allowed values Default
(none)

Input File Format

One or more sequences

Output File Format

Many sequences.

The names of the sequences are the same as the original sequence, with '_start-end' appended, where 'start', and 'end' are the start and end positions of the sub-sequence. eg: The name HSHBB would be changed in the sub-sequences to: HSHBB_1-50000 and HSHBB_50001-73308 if they were split at the size of 50000 with no overlap.

Data files

None.

Notes

There should be little requirement to split sequences into smaller sub-sequences in EMBOSS, but there may be circumstances where memory usage becomes restrictive when dealing with truly large sequences.

References

None

Warnings

None.

Diagnostic Error Messages

None.

Exit status

It always exits with status 0

Known bugs

Bugs noted but not yet fixed. None.

See also

Program nameDescription
biosedReplace or delete sequence sections
cutseqRemoves a specified section from a sequence
degapseqRemoves gap characters from sequences
descseqAlter the name or description of a sequence
entretReads and writes (returns) flatfile entries
extractfeatExtract features from a sequence
extractseqExtract regions from a sequence
listorWrites a list file of the logical OR of two sets of sequences
maskfeatMask off features of a sequence
maskseqMask off regions of a sequence
newseqType in a short new sequence
noreturnRemoves carriage return from ASCII files
notseqExcludes a set of sequences and writes out the remaining ones
nthseqWrites one sequence from a multiple set of sequences
pasteseqInsert one sequence into another
revseqReverse and complement a sequence
seqretReads and writes (returns) sequences
seqretsplitReads and writes (returns) sequences in individual files
swissparseRetrieves sequences from swissprot using keyword search
trimestTrim poly-A tails off EST sequences
trimseqTrim ambiguous bits off the ends of sequences
unionReads sequence fragments and builds one sequence
vectorstripStrips out DNA between a pair of vector sequences
yankReads a sequence range, appends the full USA to a list file

Author(s)

This application was written by Gary Williams (gwilliam@hgmp.mrc.ac.uk)

History

Completed 22 March 1999
23 May 2001 - Gary Williams - changed the overlap so that the start of a new sequence section is one more than the split-point, the end is the split-point plus the overlap. e.g. splitting at 10000 with an overlap of 100 gives: 1-10100, 10001-20100, etc.

Target users

This program is intended to be used by everyone and everything, from naive users to embedded scripts.

Comments