org.biojava.bio.symbol
Class SuffixTree

java.lang.Object
  |
  +--org.biojava.bio.symbol.SuffixTree

public class SuffixTree
extends java.lang.Object
implements java.io.Serializable

Suffix tree implementation. The interface is a bit strange, as it needed to be as space-efficient as possible. More work could be done on the space issue.

A suffix tree is an efficient method for encoding the frequencies of motifs in a sequence. They are sometimes used to quickly screen for similar sequences. For instance, all motifs of length up to 2 in the sequence AAGT could be encoded as:

 root(4)
 |
 A(2)--------G(1)-----T(1)
 |           |
 A(1)--G(1)  T(1)
 

A possible method of comparing SuffixTrees is provided as a kernel function as org.biojava.stats.svm.tools.SuffixTreeKernel.

Author:
Matthew Pocock, Thomas Down (documentation and other updates)
See Also:
Serialized Form

Inner Class Summary
static class SuffixTree.SuffixNode
          A node in the suffix tree.
 
Constructor Summary
SuffixTree(FiniteAlphabet alphabet)
          Construct a new SuffixTree to contain motifs over the specified alphabet.
 
Method Summary
 void addSymbols(SymbolList rList, int window)
          Add a count for all motifs with length of up to window to this tree.
 int frequency(int length)
          Return the number of motifs of a given length encoded in this SuffixTree.
 FiniteAlphabet getAlphabet()
          Return the Alphabet containing all Symbols which might be found in this SuffixTree.
 SuffixTree.SuffixNode getChild(SuffixTree.SuffixNode node, int i)
          Get the n'th child of a node.
 SuffixTree.SuffixNode getChild(SuffixTree.SuffixNode node, Symbol r)
          Get a child of a SuffixTree.SuffixNode, constructing a new one if need be.
 SuffixTree.SuffixNode getRoot()
          Return the node object which is the root of this suffix tree.
protected  void incCounts(int i, int c)
           
 int indexForSymbol(Symbol r)
          Return an internal index number corresponding to the given Symbol.
 int maxLength()
          Return the length of the longest motif represented in this SuffixTree
 Symbol symbolForIndex(int i)
          Return the Symbol corresponding to a specified index number.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

SuffixTree

public SuffixTree(FiniteAlphabet alphabet)
Construct a new SuffixTree to contain motifs over the specified alphabet.
Parameters:
alphabet - The alphabet of this SuffixTree (must be finite).
Method Detail

getAlphabet

public FiniteAlphabet getAlphabet()
Return the Alphabet containing all Symbols which might be found in this SuffixTree.

getRoot

public SuffixTree.SuffixNode getRoot()
Return the node object which is the root of this suffix tree. This represents the set of all motifs found in this tree.

getChild

public SuffixTree.SuffixNode getChild(SuffixTree.SuffixNode node,
                                      Symbol r)
Get a child of a SuffixTree.SuffixNode, constructing a new one if need be. This method is here due to memory optimisations.

getChild

public SuffixTree.SuffixNode getChild(SuffixTree.SuffixNode node,
                                      int i)
Get the n'th child of a node.

addSymbols

public void addSymbols(SymbolList rList,
                       int window)
Add a count for all motifs with length of up to window to this tree.
Parameters:
rList - a SymbolList whose motifs should be added to the tree.
window - The maximum motif length to count.

incCounts

protected void incCounts(int i,
                         int c)

maxLength

public int maxLength()
Return the length of the longest motif represented in this SuffixTree

frequency

public int frequency(int length)
Return the number of motifs of a given length encoded in this SuffixTree.

symbolForIndex

public Symbol symbolForIndex(int i)
Return the Symbol corresponding to a specified index number.

indexForSymbol

public int indexForSymbol(Symbol r)
Return an internal index number corresponding to the given Symbol. These numbers are used by the SuffixTree implementation to build the tree efficiently.