org.biojava.bio.symbol
Class MotifTools

java.lang.Object
  extended byorg.biojava.bio.symbol.MotifTools

public class MotifTools
extends java.lang.Object

MotifTools contains utility methods for sequence motifs.

Author:
Keith James

Constructor Summary
MotifTools()
           
 
Method Summary
static java.lang.String createRegex(SymbolList motif)
          createRegex creates a regular expression which matches the SymbolList.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

MotifTools

public MotifTools()
Method Detail

createRegex

public static java.lang.String createRegex(SymbolList motif)

createRegex creates a regular expression which matches the SymbolList. Ambiguous Symbols are simply transformed into character classes. For example the nucleotide sequence "AAGCTT" becomes "A{2}GCT{2}" and "CTNNG" is expanded to "CT[ABCDGHKMNRSTVWY]{2}G". The character class is generated using the getMatches method of an ambiguity symbol to obtain the alphabet of AtomicSymbols it matches, followed by calling getAllSymbols on this alphabet, removal of any gap symbols and then tokenization of the remainder. The ordering of the tokens in a character class is by ascending numerical order of their tokens as determined by Arrays.sort(char []).

The Alphabet of the SymbolList must be finite and must have a character token type. Regular expressions may be generated for any such SymbolList, not just DNA, RNA and protein.

Parameters:
motif - a SymbolList.
Returns:
a String regular expression.