org.biojava.bio.dist
Class DistributionTools

java.lang.Object
  extended byorg.biojava.bio.dist.DistributionTools

public final class DistributionTools
extends java.lang.Object

Title: DistributionTools.java Description: A class to hold static methods for calculations and manipulations using Distributions.

Author:
Mark Schreiber

Method Summary
static boolean areEmissionSpectraEqual(Distribution[] a, Distribution[] b)
          Compares the emission spectra of two distribution arrays
static boolean areEmissionSpectraEqual(Distribution a, Distribution b)
          Compares the emission spectra of two distributions
static Distribution average(Distribution[] dists)
          Averages two or more distributions.
static double bitsOfInformation(Distribution observed)
          Calculates the total bits of information for a distribution.
static Distribution countToDistribution(Count c)
          Make a distribution from a count
static Distribution[] distOverAlignment(Alignment a)
           
static Distribution[] distOverAlignment(Alignment a, boolean countGaps)
          Creates an array of distributions, one for each column of the alignment.
static Distribution[] distOverAlignment(Alignment a, boolean countGaps, double nullWeight)
          Creates an array of distributions, one for each column of the alignment
protected static Sequence generateOrderNSequence(java.lang.String name, OrderNDistribution d, int length)
           
static Sequence generateSequence(java.lang.String name, Distribution d, int length)
          Produces a sequence by randomly sampling the Distribution
static Distribution jointDistOverAlignment(Alignment a, boolean countGaps, double nullWeight, int[] cols)
          Creates a joint distribution
static java.util.HashMap KLDistance(Distribution observed, Distribution expected, double logBase)
          A method to calculate the Kullback-Liebler Distance (relative entropy)
static void randomizeDistribution(Distribution d)
          Randomizes the weights of a Distribution
static Distribution readFromXML(java.io.InputStream is)
           
static java.util.HashMap shannonEntropy(Distribution observed, double logBase)
          A method to calculate the Shannon Entropy for a Distribution
static double totalEntropy(Distribution observed)
          Calculates the total Entropy for a Distribution.
static void writeToXML(Distribution d, java.io.OutputStream os)
          Writes a Distribution to XML that can be read with the readFromXML method.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Method Detail

writeToXML

public static void writeToXML(Distribution d,
                              java.io.OutputStream os)
                       throws java.io.IOException
Writes a Distribution to XML that can be read with the readFromXML method.

Parameters:
d - the Distribution to write.
os - where to write it to.
Throws:
java.io.IOException - if writing fails

readFromXML

public static Distribution readFromXML(java.io.InputStream is)
                                throws java.io.IOException,
                                       org.xml.sax.SAXException
Throws:
java.io.IOException
org.xml.sax.SAXException

randomizeDistribution

public static void randomizeDistribution(Distribution d)
                                  throws ChangeVetoException
Randomizes the weights of a Distribution

Parameters:
d - the Distribution to randomize
Throws:
ChangeVetoException - if the Distribution is locked

countToDistribution

public static Distribution countToDistribution(Count c)
Make a distribution from a count

Parameters:
c - the count
Returns:
a Distrubution over the same FiniteAlphabet as c and trained with the counts of c

areEmissionSpectraEqual

public static final boolean areEmissionSpectraEqual(Distribution a,
                                                    Distribution b)
                                             throws BioException
Compares the emission spectra of two distributions

Parameters:
a - A Distribution with the same Alphabet as b
b - A Distribution with the same Alphabet as a
Returns:
true if alphabets and symbol weights are equal for the two distributions.
Throws:
BioException - if one or both of the Distributions are over infinite alphabets.
Since:
1.2

areEmissionSpectraEqual

public static final boolean areEmissionSpectraEqual(Distribution[] a,
                                                    Distribution[] b)
                                             throws BioException
Compares the emission spectra of two distribution arrays

Parameters:
a - A Distribution[] consisting of Distributions over a FiniteAlphabet
b - A Distribution[] consisting of Distributions over a FiniteAlphabet
Returns:
true if alphabets and symbol weights are equal for each pair of distributions. Will return false if the arrays are of unequal length.
Throws:
BioException - if one of the Distributions is over an infinite alphabet.
Since:
1.3

KLDistance

public static final java.util.HashMap KLDistance(Distribution observed,
                                                 Distribution expected,
                                                 double logBase)
A method to calculate the Kullback-Liebler Distance (relative entropy)

Parameters:
logBase - - the log base for the entropy calculation. 2 is standard.
observed - - the observed frequence of Symbols .
expected - - the excpected or background frequency.
Returns:
- A HashMap mapping Symbol to (Double) relative entropy.
Since:
1.2

shannonEntropy

public static final java.util.HashMap shannonEntropy(Distribution observed,
                                                     double logBase)
A method to calculate the Shannon Entropy for a Distribution

Parameters:
logBase - - the log base for the entropy calculation. 2 is standard.
observed - - the observed frequence of Symbols .
Returns:
- A HashMap mapping Symbol to (Double) entropy.
Since:
1.2

totalEntropy

public static double totalEntropy(Distribution observed)
Calculates the total Entropy for a Distribution. Entropies for individual Symbols are weighted by their probability of occurence.

Parameters:
observed - the observed frequence of Symbols .
Returns:
the total entropy of the Distribution .

bitsOfInformation

public static final double bitsOfInformation(Distribution observed)
Calculates the total bits of information for a distribution.

Parameters:
observed - - the observed frequence of Symbols .
Returns:
the total information content of the Distribution .
Since:
1.2

distOverAlignment

public static Distribution[] distOverAlignment(Alignment a)
                                        throws IllegalAlphabetException
Throws:
IllegalAlphabetException

jointDistOverAlignment

public static final Distribution jointDistOverAlignment(Alignment a,
                                                        boolean countGaps,
                                                        double nullWeight,
                                                        int[] cols)
                                                 throws IllegalAlphabetException
Creates a joint distribution

Parameters:
a - the Alignment to build the Distribution[] over.
countGaps - if true gaps will be included in the distributions (NOT YET IMPLEMENTED!!, CURRENTLY EITHER OPTION WILL PRODUCE THE SAME RESULT)
nullWeight - the number of pseudo counts to add to each distribution
Returns:
a Distribution
Throws:
IllegalAlphabetException - if all sequences don't use the same alphabet
Since:
1.2

distOverAlignment

public static final Distribution[] distOverAlignment(Alignment a,
                                                     boolean countGaps,
                                                     double nullWeight)
                                              throws IllegalAlphabetException
Creates an array of distributions, one for each column of the alignment

Parameters:
a - the Alignment to build the Distribution[] over.
countGaps - if true gaps will be included in the distributions (NOT YET IMPLEMENTED!!, CURRENTLY EITHER OPTION WILL PRODUCE THE SAME RESULT)
nullWeight - the number of pseudo counts to add to each distribution
Returns:
a Distribution[] where each member of the array is a Distribution of the Symbols found at that position of the Alignment .
Throws:
IllegalAlphabetException - if all sequences don't use the same alphabet
Since:
1.2

distOverAlignment

public static final Distribution[] distOverAlignment(Alignment a,
                                                     boolean countGaps)
                                              throws IllegalAlphabetException
Creates an array of distributions, one for each column of the alignment. No pseudo counts are used.

Parameters:
countGaps - if true gaps will be included in the distributions
a - the Alignment to build the Distribution[] over.
Returns:
a Distribution[] where each member of the array is a Distribution of the Symbols found at that position of the Alignment .
Throws:
IllegalAlphabetException - if the alignment is not composed from sequences all with the same alphabet
Since:
1.2

average

public static final Distribution average(Distribution[] dists)
Averages two or more distributions. NOTE the current implementation ignore the null model.

Parameters:
dists - the Distributions to average
Returns:
a Distribution were the weight of each Symbol is the average of the weights of that Symbol in each Distribution .
Since:
1.2

generateSequence

public static final Sequence generateSequence(java.lang.String name,
                                              Distribution d,
                                              int length)
Produces a sequence by randomly sampling the Distribution

Parameters:
name - the name for the sequence
d - the distribution to sample. If this distribution is of order N a seed sequence is generated allowed to 'burn in' for 1000 iterations and used to produce a sequence over the conditioned alphabet.
length - the number of symbols in the sequence.
Returns:
a SimpleSequence with name and urn = to name and an Empty Annotation.

generateOrderNSequence

protected static final Sequence generateOrderNSequence(java.lang.String name,
                                                       OrderNDistribution d,
                                                       int length)