|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectorg.biojavax.bio.seq.io.RichSequenceFormat.BasicFormat
org.biojavax.bio.seq.io.RichSequenceFormat.HeaderlessFormat
org.biojavax.bio.seq.io.FastaFormat
public class FastaFormat
Format object representing FASTA files. These files are almost pure sequence data.
Nested Class Summary |
---|
Nested classes/interfaces inherited from interface org.biojavax.bio.seq.io.RichSequenceFormat |
---|
RichSequenceFormat.BasicFormat, RichSequenceFormat.HeaderlessFormat |
Field Summary | |
---|---|
protected static Pattern |
aminoAcids
|
protected static Pattern |
dp
|
static String |
FASTA_FORMAT
The name of this format |
protected static Pattern |
hp
|
protected static Pattern |
readableFiles
|
Constructor Summary | |
---|---|
FastaFormat()
|
Method Summary | |
---|---|
boolean |
canRead(BufferedInputStream stream)
Check to see if a given stream is in our format. A stream is in FASTA format if the stream starts with ">". |
boolean |
canRead(File file)
Check to see if a given file is in our format. Some formats may be able to determine this by filename, whilst others may have to open the file and read it to see what format it is in. A file is in FASTA format if the name ends with fa or fas, or the file starts with ">". |
String |
getDefaultFormat()
getDefaultFormat returns the String identifier for
the default sub-format written by a SequenceFormat
implementation. |
FastaHeader |
getHeader()
|
SymbolTokenization |
guessSymbolTokenization(BufferedInputStream stream)
On the assumption that the stream is readable by this format (not checked), attempt to guess which symbol tokenization we should use to read it. Returns an protein parser if the first line of sequence contains any of F/L/I/P/Q/E, otherwise returns a DNA tokenizer. |
SymbolTokenization |
guessSymbolTokenization(File file)
On the assumption that the file is readable by this format (not checked), attempt to guess which symbol tokenization we should use to read it. For formats that only accept one tokenization, just return it without checking the file. For formats that accept multiple tokenizations, its up to you how you do it. Returns an protein parser if the first line of sequence contains any of F/L/I/P/Q/E, otherwise returns a DNA tokenizer. |
void |
processHeader(String line,
RichSeqIOListener rsiol,
Namespace ns)
Parse the Header information from the Fasta Description line |
boolean |
readRichSequence(BufferedReader reader,
SymbolTokenization symParser,
RichSeqIOListener rsiol,
Namespace ns)
Reads a sequence from the given buffered reader using the given tokenizer to parse sequence symbols. If namespace is null, then the namespace of the sequence in the fasta is used. |
boolean |
readSequence(BufferedReader reader,
SymbolTokenization symParser,
SeqIOListener listener)
Read a sequence and pass data on to a SeqIOListener. |
void |
setHeader(FastaHeader header)
|
void |
writeSequence(Sequence seq,
Namespace ns)
Writes a sequence out to the outputstream given by beginWriting() using the default format of the implementing class. If namespace is null, then the sequence's own namespace is used. |
void |
writeSequence(Sequence seq,
PrintStream os)
writeSequence writes a sequence to the specified
PrintStream, using the default format. |
void |
writeSequence(Sequence seq,
String format,
PrintStream os)
writeSequence writes a sequence to the specified
PrintStream , using the specified format. |
Methods inherited from class org.biojavax.bio.seq.io.RichSequenceFormat.HeaderlessFormat |
---|
beginWriting, finishWriting |
Methods inherited from class org.biojavax.bio.seq.io.RichSequenceFormat.BasicFormat |
---|
getElideComments, getElideFeatures, getElideReferences, getElideSymbols, getLineWidth, getPrintStream, setElideComments, setElideFeatures, setElideReferences, setElideSymbols, setLineWidth, setPrintStream |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
---|
public static final String FASTA_FORMAT
protected static final Pattern hp
protected static final Pattern dp
protected static final Pattern readableFiles
protected static final Pattern aminoAcids
Constructor Detail |
---|
public FastaFormat()
Method Detail |
---|
public boolean canRead(File file) throws IOException
canRead
in interface RichSequenceFormat
canRead
in class RichSequenceFormat.BasicFormat
file
- the File
to check.
IOException
- in case the file is inaccessible.public SymbolTokenization guessSymbolTokenization(File file) throws IOException
guessSymbolTokenization
in interface RichSequenceFormat
guessSymbolTokenization
in class RichSequenceFormat.BasicFormat
file
- the File
object to guess the format of.
SymbolTokenization
to read the file with.
IOException
- if the file is unrecognisable or inaccessible.public boolean canRead(BufferedInputStream stream) throws IOException
stream
- the BufferedInputStream
to check.
IOException
- in case the stream is inaccessible.public SymbolTokenization guessSymbolTokenization(BufferedInputStream stream) throws IOException
stream
- the BufferedInputStream
object to guess the format of.
SymbolTokenization
to read the stream with.
IOException
- if the stream is unrecognisable or inaccessible.public boolean readSequence(BufferedReader reader, SymbolTokenization symParser, SeqIOListener listener) throws IllegalSymbolException, IOException, ParseException
reader
- The stream of data to parse.symParser
- A SymbolParser defining a mapping from
character data to Symbols.listener
- A listener to notify when data is extracted
from the stream.
IllegalSymbolException
- if it is not possible to
translate character data from the stream into valid BioJava
symbols.
IOException
- if an error occurs while reading from the
stream.
ParseException
public boolean readRichSequence(BufferedReader reader, SymbolTokenization symParser, RichSeqIOListener rsiol, Namespace ns) throws IllegalSymbolException, IOException, ParseException
reader
- the input sourcesymParser
- the tokenizer which understands the sequence being readrsiol
- the listener to send sequence events tons
- the namespace to read sequences into.
IllegalSymbolException
- if the tokenizer couldn't understand one of the
sequence symbols in the file.
IOException
- if there was a read error.
ParseException
public void processHeader(String line, RichSeqIOListener rsiol, Namespace ns) throws IOException, ParseException
line
- rsiol
- ns
-
IOException
ParseException
public void writeSequence(Sequence seq, PrintStream os) throws IOException
writeSequence
writes a sequence to the specified
PrintStream, using the default format.
seq
- the sequence to write out.os
- the printstream to write to.
IOException
public void writeSequence(Sequence seq, String format, PrintStream os) throws IOException
writeSequence
writes a sequence to the specified
PrintStream
, using the specified format.
seq
- a Sequence
to write out.format
- a String
indicating which sub-format
of those available from a particular
SequenceFormat
implemention to use when
writing.os
- a PrintStream
object.
IOException
- if an error occurs.public void writeSequence(Sequence seq, Namespace ns) throws IOException
seq
- the sequence to writens
- the namespace to write it with
IOException
- in case it couldn't write somethingpublic String getDefaultFormat()
getDefaultFormat
returns the String identifier for
the default sub-format written by a SequenceFormat
implementation.
String
.public FastaHeader getHeader()
public void setHeader(FastaHeader header)
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |