org.biojava.bio.search
Class SeqContentPattern

java.lang.Object
  extended by org.biojava.bio.search.SeqContentPattern
All Implemented Interfaces:
BioPattern

public class SeqContentPattern
extends Object
implements BioPattern

A pattern that can be used to find regions with given sequence content.

Regular expressions can be used to find sequence patterns. However, some things can't be easily expressed as a regular expression. For example, a region of length 10 that contains at least 8 Gs and up to two Ts and no other symbols. A SeqContentPattern can be built that does represent this.

 SeqContentPattern scp = new SeqContentPattern(DNATools.getDNA());
 scp.setLength(10);
 scp.setMinCounts(DNATools.g(), 8);
 scp.setMaxCounts(DNATools.t(), 2);
 scp.setMaxCounts(DNATools.c(), 0);
 scp.setMaxCounts(DNATools.a(), 0);
 

The minimum counts default to 0, and the maximum counts default to the length. If you have not manually set the maximum count for a symbol, it will continue to adjust while you change the length. Once you have set it, it will not vary, even if you do set the length. To re-set a maximum count to track the length, set it to -1.

All regions of the defined length for which all constraints are satisfied will potentialy be found. At the moment we have not defined what will happen for multiple regions that overlap all of which satisfy the constraints.

Since:
1.4
Author:
Matthew Pocock

Constructor Summary
SeqContentPattern(FiniteAlphabet alpha)
          Create a new SeqContentPattern over an alphabet.
 
Method Summary
 int getLength()
          Get the current length.
 int getMaxCounts(AtomicSymbol as)
          Get the maximum counts required for a symbol.
 int getMinCounts(AtomicSymbol as)
          Get the minimum counts required for a symbol.
 BioMatcher matcher(SymbolList symList)
          Get a matcher that will use these parameters to search a SymbolList.
 void setLength(int length)
          Set the pattern length.
 void setMaxCounts(AtomicSymbol as, int count)
          Set the maximum counts required for a symbol.
 void setMinCounts(AtomicSymbol as, int count)
          Set the minimum counts required for a symbol.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

SeqContentPattern

public SeqContentPattern(FiniteAlphabet alpha)
Create a new SeqContentPattern over an alphabet.

Parameters:
alpha - the FiniteAlphabet for this pattern
Method Detail

getLength

public int getLength()
Get the current length.

Returns:
the length

setLength

public void setLength(int length)
Set the pattern length.

Parameters:
length - the new length

setMinCounts

public void setMinCounts(AtomicSymbol as,
                         int count)
                  throws IllegalSymbolException
Set the minimum counts required for a symbol.

Parameters:
as - the AtomicSymbol to check
count - the minimum number of counts it must have
Throws:
IllegalSymbolException - if as is not known in this alphabet

getMinCounts

public int getMinCounts(AtomicSymbol as)
                 throws IllegalSymbolException
Get the minimum counts required for a symbol.

Parameters:
as - the AtomicSymbol to check
Returns:
the minimum number of counts it must have
Throws:
IllegalSymbolException - if as is not known in this alphabet

setMaxCounts

public void setMaxCounts(AtomicSymbol as,
                         int count)
                  throws IllegalSymbolException
Set the maximum counts required for a symbol. Use -1 to reset it to track the length.

Parameters:
as - the AtomicSymbol to check
count - the maximum number of counts it must have
Throws:
IllegalSymbolException - if as is not known in this alphabet

getMaxCounts

public int getMaxCounts(AtomicSymbol as)
                 throws IllegalSymbolException
Get the maximum counts required for a symbol.

Parameters:
as - the AtomicSymbol to check
Returns:
the maximum number of counts it must have
Throws:
IllegalSymbolException - if as is not known in this alphabet

matcher

public BioMatcher matcher(SymbolList symList)
                   throws IllegalAlphabetException
Description copied from interface: BioPattern
Get a matcher that will use these parameters to search a SymbolList.

The resulting BioMatcher is independant of this BioPattern. In particular, calling any mutator methods on this pattern will not affect the matcher.

Specified by:
matcher in interface BioPattern
Parameters:
symList - the SymbolList to match against
Returns:
a BioMatcher that will perform the search
Throws:
IllegalAlphabetException - if symList is not over the right alphabet