org.biojava.bio.seq.io
Class SeqIOTools

java.lang.Object
  extended byorg.biojava.bio.seq.io.SeqIOTools

public final class SeqIOTools
extends java.lang.Object

A set of convenience methods for handling common file formats.

Since:
1.1
Author:
Thomas Down, Mark Schreiber, Nimesh Singh, Matthew Pocock, Keith James

Method Summary
static void biojavaToFile(int fileType, java.io.OutputStream os, java.lang.Object biojava)
          Converts a Biojava object to the given filetype.
static void biojavaToFile(java.lang.String formatName, java.lang.String alphabetName, java.io.OutputStream os, java.lang.Object biojava)
          Writes a Biojava SequenceIterator, SequenceDB, Sequence or Aligment to an OutputStream
static java.lang.Object fileToBiojava(int fileType, java.io.BufferedReader br)
          Reads a file and returns the corresponding Biojava object.
static java.lang.Object fileToBiojava(java.lang.String formatName, java.lang.String alphabetName, java.io.BufferedReader br)
          Reads a file with the specified format and alphabet
static SequenceBuilderFactory formatToFactory(SequenceFormat format, Alphabet alpha)
          Deprecated. as this essentially duplicates the operation available in the method identifyBuilderFactory.
static FiniteAlphabet getAlphabet(int identifier)
          getAlphabet accepts a value which represents a sequence format and returns the relevant FiniteAlphabet object.
static SequenceBuilderFactory getBuilderFactory(int identifier)
          getBuilderFactory accepts a value which represents a sequence format and returns the relevant SequenceBuilderFactory object.
static SequenceBuilderFactory getEmblBuilderFactory()
          Get a default SequenceBuilderFactory for handling EMBL files.
static SequenceBuilderFactory getFastaBuilderFactory()
          Get a default SequenceBuilderFactory for handling FASTA files.
static SequenceBuilderFactory getGenbankBuilderFactory()
          Get a default SequenceBuilderFactory for handling GenBank files.
static SequenceBuilderFactory getGenpeptBuilderFactory()
          Get a default SequenceBuilderFactory for handling Genpept files.
static SequenceFormat getSequenceFormat(int identifier)
          getSequenceFormat accepts a value which represents a sequence format and returns the relevant SequenceFormat object.
static SequenceBuilderFactory getSwissprotBuilderFactory()
          Get a default SequenceBuilderFactory for handling Swissprot files.
static int guessFileType(java.io.File seqFile)
          Deprecated. because there is no standard file naming convention and guessing by file name is inherantly error prone and bad.
static int identifyFormat(java.lang.String formatName, java.lang.String alphabetName)
          identifyFormat performs a case-insensitive mapping of a pair of common sequence format name (such as 'embl', 'genbank' or 'fasta') and alphabet name (such as 'dna', 'rna', 'protein', 'aa') to an integer.
static SequenceIterator readEmbl(java.io.BufferedReader br)
          Iterate over the sequences in an EMBL-format stream.
static SequenceIterator readEmblNucleotide(java.io.BufferedReader br)
          Iterate over the sequences in an EMBL-format stream.
static SequenceIterator readEmblRNA(java.io.BufferedReader br)
          Iterate over the sequences in an EMBL-format stream, but for RNA.
static SequenceDB readFasta(java.io.InputStream seqFile, Alphabet alpha)
          Create a sequence database from a fasta file provided as an input stream.
static SequenceIterator readFastaDNA(java.io.BufferedReader br)
          Iterate over the sequences in an FASTA-format stream of DNA sequences.
static SequenceIterator readFastaProtein(java.io.BufferedReader br)
          Iterate over the sequences in an FASTA-format stream of Protein sequences.
static SequenceIterator readFastaRNA(java.io.BufferedReader br)
          Iterate over the sequences in an FASTA-format stream of RNA sequences.
static SequenceIterator readGenbank(java.io.BufferedReader br)
          Iterate over the sequences in an GenBank-format stream.
static SequenceIterator readGenpept(java.io.BufferedReader br)
          Iterate over the sequences in an Genpept-format stream.
static SequenceIterator readSwissprot(java.io.BufferedReader br)
          Iterate over the sequences in an Swissprot-format stream.
static void writeEmbl(java.io.OutputStream os, Sequence seq)
          Writes a single Sequence to an OutputStream in EMBL format.
static void writeEmbl(java.io.OutputStream os, SequenceIterator in)
          Writes a stream of Sequences to an OutputStream in EMBL format.
static void writeFasta(java.io.OutputStream os, Sequence seq)
          Writes a single Sequence to an OutputStream in Fasta format.
static void writeFasta(java.io.OutputStream os, SequenceDB db)
          Write a sequenceDB to an output stream in fasta format.
static void writeFasta(java.io.OutputStream os, SequenceIterator in)
          Writes sequences from a SequenceIterator to an OutputStream in Fasta Format.
static void writeGenbank(java.io.OutputStream os, Sequence seq)
          Writes a single Sequence to an OutputStream in Genbank format.
static void writeGenbank(java.io.OutputStream os, SequenceIterator in)
          Writes a stream of Sequences to an OutputStream in Genbank format.
static void writeGenpept(java.io.OutputStream os, Sequence seq)
          Writes a single Sequence to an OutputStream in Genpept format.
static void writeGenpept(java.io.OutputStream os, SequenceIterator in)
          Writes a stream of Sequences to an OutputStream in Genpept format.
static void writeSwissprot(java.io.OutputStream os, Sequence seq)
          Writes a single Sequence to an OutputStream in SwissProt format.
static void writeSwissprot(java.io.OutputStream os, SequenceIterator in)
          Writes a stream of Sequences to an OutputStream in SwissProt format.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Method Detail

getEmblBuilderFactory

public static SequenceBuilderFactory getEmblBuilderFactory()
Get a default SequenceBuilderFactory for handling EMBL files.


readEmbl

public static SequenceIterator readEmbl(java.io.BufferedReader br)
Iterate over the sequences in an EMBL-format stream.


readEmblRNA

public static SequenceIterator readEmblRNA(java.io.BufferedReader br)
Iterate over the sequences in an EMBL-format stream, but for RNA.


readEmblNucleotide

public static SequenceIterator readEmblNucleotide(java.io.BufferedReader br)
Iterate over the sequences in an EMBL-format stream.


getGenbankBuilderFactory

public static SequenceBuilderFactory getGenbankBuilderFactory()
Get a default SequenceBuilderFactory for handling GenBank files.


readGenbank

public static SequenceIterator readGenbank(java.io.BufferedReader br)
Iterate over the sequences in an GenBank-format stream.


getGenpeptBuilderFactory

public static SequenceBuilderFactory getGenpeptBuilderFactory()
Get a default SequenceBuilderFactory for handling Genpept files.


readGenpept

public static SequenceIterator readGenpept(java.io.BufferedReader br)
Iterate over the sequences in an Genpept-format stream.


getSwissprotBuilderFactory

public static SequenceBuilderFactory getSwissprotBuilderFactory()
Get a default SequenceBuilderFactory for handling Swissprot files.


readSwissprot

public static SequenceIterator readSwissprot(java.io.BufferedReader br)
Iterate over the sequences in an Swissprot-format stream.


getFastaBuilderFactory

public static SequenceBuilderFactory getFastaBuilderFactory()
Get a default SequenceBuilderFactory for handling FASTA files.


readFastaDNA

public static SequenceIterator readFastaDNA(java.io.BufferedReader br)
Iterate over the sequences in an FASTA-format stream of DNA sequences.


readFastaRNA

public static SequenceIterator readFastaRNA(java.io.BufferedReader br)
Iterate over the sequences in an FASTA-format stream of RNA sequences.


readFastaProtein

public static SequenceIterator readFastaProtein(java.io.BufferedReader br)
Iterate over the sequences in an FASTA-format stream of Protein sequences.


readFasta

public static SequenceDB readFasta(java.io.InputStream seqFile,
                                   Alphabet alpha)
                            throws BioException
Create a sequence database from a fasta file provided as an input stream. Note this somewhat duplicates functionality in the readFastaDNA and readFastaProtein methods but uses a stream rather than a reader and returns a SequenceDB rather than a SequenceIterator. If the returned DB is likely to be large then the above mentioned methods should be used.

Throws:
BioException - if problems occur during reading of the stream.
Since:
1.2

writeFasta

public static void writeFasta(java.io.OutputStream os,
                              SequenceDB db)
                       throws java.io.IOException
Write a sequenceDB to an output stream in fasta format.

Throws:
java.io.IOException - if there was an error while writing.
Since:
1.2

writeFasta

public static void writeFasta(java.io.OutputStream os,
                              SequenceIterator in)
                       throws java.io.IOException
Writes sequences from a SequenceIterator to an OutputStream in Fasta Format. This makes for a useful format filter where a StreamReader can be sent to the StreamWriter after formatting.

Throws:
java.io.IOException - if there was an error while writing.
Since:
1.2

writeFasta

public static void writeFasta(java.io.OutputStream os,
                              Sequence seq)
                       throws java.io.IOException
Writes a single Sequence to an OutputStream in Fasta format.

Parameters:
os - the OutputStream.
seq - the Sequence.
Throws:
java.io.IOException - if there was an error while writing.

writeEmbl

public static void writeEmbl(java.io.OutputStream os,
                             SequenceIterator in)
                      throws java.io.IOException
Writes a stream of Sequences to an OutputStream in EMBL format.

Parameters:
os - the OutputStream.
in - a SequenceIterator.
Throws:
java.io.IOException - if there was an error while writing.

writeEmbl

public static void writeEmbl(java.io.OutputStream os,
                             Sequence seq)
                      throws java.io.IOException
Writes a single Sequence to an OutputStream in EMBL format.

Parameters:
os - the OutputStream.
seq - the Sequence.
Throws:
java.io.IOException - if there was an error while writing.

writeSwissprot

public static void writeSwissprot(java.io.OutputStream os,
                                  SequenceIterator in)
                           throws java.io.IOException,
                                  BioException
Writes a stream of Sequences to an OutputStream in SwissProt format.

Parameters:
os - the OutputStream.
in - a SequenceIterator.
Throws:
java.io.IOException - if there was an error while writing.
BioException

writeSwissprot

public static void writeSwissprot(java.io.OutputStream os,
                                  Sequence seq)
                           throws java.io.IOException,
                                  BioException
Writes a single Sequence to an OutputStream in SwissProt format.

Parameters:
os - the OutputStream.
seq - the Sequence.
Throws:
java.io.IOException - if there was an error while writing.
BioException

writeGenpept

public static void writeGenpept(java.io.OutputStream os,
                                SequenceIterator in)
                         throws java.io.IOException,
                                BioException
Writes a stream of Sequences to an OutputStream in Genpept format.

Parameters:
os - the OutputStream.
in - a SequenceIterator.
Throws:
java.io.IOException - if there was an error while writing.
BioException

writeGenpept

public static void writeGenpept(java.io.OutputStream os,
                                Sequence seq)
                         throws java.io.IOException,
                                BioException
Writes a single Sequence to an OutputStream in Genpept format.

Parameters:
os - the OutputStream.
seq - the Sequence.
Throws:
java.io.IOException - if there was an error while writing.
BioException

writeGenbank

public static void writeGenbank(java.io.OutputStream os,
                                SequenceIterator in)
                         throws java.io.IOException
Writes a stream of Sequences to an OutputStream in Genbank format.

Parameters:
os - the OutputStream.
in - a SequenceIterator.
Throws:
java.io.IOException - if there was an error while writing.

writeGenbank

public static void writeGenbank(java.io.OutputStream os,
                                Sequence seq)
                         throws java.io.IOException
Writes a single Sequence to an OutputStream in Genbank format.

Parameters:
os - the OutputStream.
seq - the Sequence.
Throws:
java.io.IOException - if there was an error while writing.

identifyFormat

public static int identifyFormat(java.lang.String formatName,
                                 java.lang.String alphabetName)
identifyFormat performs a case-insensitive mapping of a pair of common sequence format name (such as 'embl', 'genbank' or 'fasta') and alphabet name (such as 'dna', 'rna', 'protein', 'aa') to an integer. The value returned will be one of the public static final fields in SeqIOConstants, or a bitwise-or combination of them. The method will reject known illegal combinations of format and alphabet (such as swissprot + dna) by throwing an IllegalArgumentException. It will return the SeqIOConstants.UNKNOWN value when either format or alphabet are unknown.

Parameters:
formatName - a String.
alphabetName - a String.
Returns:
an int.

getSequenceFormat

public static SequenceFormat getSequenceFormat(int identifier)
                                        throws BioException
getSequenceFormat accepts a value which represents a sequence format and returns the relevant SequenceFormat object.

Parameters:
identifier - an int which represents a binary value with bits set according to the scheme described in SeqIOConstants.
Returns:
a SequenceFormat.
Throws:
BioException - if an error occurs.

getBuilderFactory

public static SequenceBuilderFactory getBuilderFactory(int identifier)
                                                throws BioException
getBuilderFactory accepts a value which represents a sequence format and returns the relevant SequenceBuilderFactory object.

Parameters:
identifier - an int which represents a binary value with bits set according to the scheme described in SeqIOConstants.
Returns:
a SequenceBuilderFactory.
Throws:
BioException - if an error occurs.

getAlphabet

public static FiniteAlphabet getAlphabet(int identifier)
                                  throws BioException
getAlphabet accepts a value which represents a sequence format and returns the relevant FiniteAlphabet object.

Parameters:
identifier - an int which represents a binary value with bits set according to the scheme described in SeqIOConstants.
Returns:
a FiniteAlphabet.
Throws:
BioException - if an error occurs.

guessFileType

public static int guessFileType(java.io.File seqFile)
                         throws java.io.IOException,
                                java.io.FileNotFoundException
Deprecated. because there is no standard file naming convention and guessing by file name is inherantly error prone and bad.

Attempts to guess the filetype of a file given the name. For use with the functions below that take an int fileType as a parameter. EMBL and Genbank files are assumed to contain DNA sequence.

Throws:
java.io.IOException
java.io.FileNotFoundException

formatToFactory

public static SequenceBuilderFactory formatToFactory(SequenceFormat format,
                                                     Alphabet alpha)
                                              throws BioException
Deprecated. as this essentially duplicates the operation available in the method identifyBuilderFactory.

Attempts to retrieve the most appropriate SequenceBuilder object for some combination of Alphabet and SequenceFormat

Parameters:
format - currently supports FastaFormat, GenbankFormat, EmblLikeFormat
alpha - currently only supports the DNA and Protein alphabets
Returns:
the SequenceBuilderFactory
Throws:
BioException - if the combination of alpha and format is unrecognized.

fileToBiojava

public static java.lang.Object fileToBiojava(java.lang.String formatName,
                                             java.lang.String alphabetName,
                                             java.io.BufferedReader br)
                                      throws BioException
Reads a file with the specified format and alphabet

Parameters:
formatName - the name of the format eg genbank or swissprot (case insensitive)
alphabetName - the name of the alphabet eg dna or rna or protein (case insensitive)
br - a BufferedReader for the input
Returns:
either an Alignment object or a SequenceIterator (depending on the format read)
Throws:
BioException - if an error occurs while reading or a unrecognized format, alphabet combination is used (eg swissprot and DNA).
Since:
1.3

fileToBiojava

public static java.lang.Object fileToBiojava(int fileType,
                                             java.io.BufferedReader br)
                                      throws BioException
Reads a file and returns the corresponding Biojava object. You need to cast it as an Alignment or a SequenceIterator as appropriate.

Throws:
BioException

biojavaToFile

public static void biojavaToFile(java.lang.String formatName,
                                 java.lang.String alphabetName,
                                 java.io.OutputStream os,
                                 java.lang.Object biojava)
                          throws BioException,
                                 java.io.IOException,
                                 IllegalSymbolException
Writes a Biojava SequenceIterator, SequenceDB, Sequence or Aligment to an OutputStream

Parameters:
formatName - eg fasta, GenBank (case insensitive)
alphabetName - eg DNA, RNA (case insensititve)
os - where to write to
biojava - the object to write
Throws:
BioException - problems getting data from the biojava object.
java.io.IOException - if there are IO problems
IllegalSymbolException - a Symbol cannot be parsed

biojavaToFile

public static void biojavaToFile(int fileType,
                                 java.io.OutputStream os,
                                 java.lang.Object biojava)
                          throws BioException,
                                 java.io.IOException,
                                 IllegalSymbolException
Converts a Biojava object to the given filetype.

Throws:
BioException
java.io.IOException
IllegalSymbolException