org.apache.commons.math3.random
Class EmpiricalDistribution

java.lang.Object
  extended by org.apache.commons.math3.random.EmpiricalDistribution
All Implemented Interfaces:
Serializable

public class EmpiricalDistribution
extends Object
implements Serializable

Represents an empirical probability distribution -- a probability distribution derived from observed data without making any assumptions about the functional form of the population distribution that the data come from.

An EmpiricalDistribution maintains data structures, called distribution digests, that describe empirical distributions and support the following operations:

Applications can use EmpiricalDistribution to build grouped frequency histograms representing the input data or to generate random values "like" those in the input file -- i.e., the values generated will follow the distribution of the values in the file.

The implementation uses what amounts to the Variable Kernel Method with Gaussian smoothing:

Digesting the input file

  1. Pass the file once to compute min and max.
  2. Divide the range from min-max into binCount "bins."
  3. Pass the data file again, computing bin counts and univariate statistics (mean, std dev.) for each of the bins
  4. Divide the interval (0,1) into subintervals associated with the bins, with the length of a bin's subinterval proportional to its count.
Generating random values from the distribution
  1. Generate a uniformly distributed value in (0,1)
  2. Select the subinterval to which the value belongs.
  3. Generate a random Gaussian value with mean = mean of the associated bin and std dev = std dev of associated bin.

USAGE NOTES:

Version:
$Id: EmpiricalDistribution.java 1244107 2012-02-14 16:17:55Z erans $
See Also:
Serialized Form

Nested Class Summary
private  class EmpiricalDistribution.ArrayDataAdapter
          DataAdapter for data provided as array of doubles.
private  class EmpiricalDistribution.DataAdapter
          Provides methods for computing sampleStats and beanStats abstracting the source of data.
private  class EmpiricalDistribution.DataAdapterFactory
          Factory of DataAdapter objects.
private  class EmpiricalDistribution.StreamDataAdapter
          DataAdapter for data provided through some input stream
 
Field Summary
private  int binCount
          number of bins
private  List<SummaryStatistics> binStats
          List of SummaryStatistics objects characterizing the bins
static int DEFAULT_BIN_COUNT
          Default bin count
private  double delta
          Grid size
private  boolean loaded
          is the distribution loaded?
private  double max
          Max loaded value
private  double min
          Min loaded value
private  RandomDataImpl randomData
          RandomDataImpl instance to use in repeated calls to getNext()
private  SummaryStatistics sampleStats
          Sample statistics
private static long serialVersionUID
          Serializable version identifier
private  double[] upperBounds
          upper bounds of subintervals in (0,1) "belonging" to the bins
 
Constructor Summary
EmpiricalDistribution()
          Creates a new EmpiricalDistribution with the default bin count.
EmpiricalDistribution(int binCount)
          Creates a new EmpiricalDistribution with the specified bin count.
EmpiricalDistribution(int binCount, RandomDataImpl randomData)
          Creates a new EmpiricalDistribution with the specified bin count using the provided RandomDataImpl instance as the source of random data.
EmpiricalDistribution(int binCount, RandomGenerator generator)
          Creates a new EmpiricalDistribution with the specified bin count using the provided RandomGenerator as the source of random data.
EmpiricalDistribution(RandomDataImpl randomData)
          Creates a new EmpiricalDistribution with default bin count using the provided RandomDataImpl as the source of random data.
EmpiricalDistribution(RandomGenerator generator)
          Creates a new EmpiricalDistribution with default bin count using the provided RandomGenerator as the source of random data.
 
Method Summary
private  void fillBinStats(Object in)
          Fills binStats array (second pass through data file).
private  int findBin(double value)
          Returns the index of the bin to which the given value belongs
 int getBinCount()
          Returns the number of bins.
 List<SummaryStatistics> getBinStats()
          Returns a List of SummaryStatistics instances containing statistics describing the values in each of the bins.
 double[] getGeneratorUpperBounds()
          Returns a fresh copy of the array of upper bounds of the subintervals of [0,1] used in generating data from the empirical distribution.
 double getNextValue()
          Generates a random value from this distribution.
 StatisticalSummary getSampleStats()
          Returns a StatisticalSummary describing this distribution.
 double[] getUpperBounds()
          Returns a fresh copy of the array of upper bounds for the bins.
 boolean isLoaded()
          Property indicating whether or not the distribution has been loaded.
 void load(double[] in)
          Computes the empirical distribution from the provided array of numbers.
 void load(File file)
          Computes the empirical distribution from the input file.
 void load(URL url)
          Computes the empirical distribution using data read from a URL.
 void reSeed(long seed)
          Reseeds the random number generator used by getNextValue().
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

DEFAULT_BIN_COUNT

public static final int DEFAULT_BIN_COUNT
Default bin count

See Also:
Constant Field Values

serialVersionUID

private static final long serialVersionUID
Serializable version identifier

See Also:
Constant Field Values

binStats

private final List<SummaryStatistics> binStats
List of SummaryStatistics objects characterizing the bins


sampleStats

private SummaryStatistics sampleStats
Sample statistics


max

private double max
Max loaded value


min

private double min
Min loaded value


delta

private double delta
Grid size


binCount

private final int binCount
number of bins


loaded

private boolean loaded
is the distribution loaded?


upperBounds

private double[] upperBounds
upper bounds of subintervals in (0,1) "belonging" to the bins


randomData

private final RandomDataImpl randomData
RandomDataImpl instance to use in repeated calls to getNext()

Constructor Detail

EmpiricalDistribution

public EmpiricalDistribution()
Creates a new EmpiricalDistribution with the default bin count.


EmpiricalDistribution

public EmpiricalDistribution(int binCount)
Creates a new EmpiricalDistribution with the specified bin count.

Parameters:
binCount - number of bins

EmpiricalDistribution

public EmpiricalDistribution(int binCount,
                             RandomGenerator generator)
Creates a new EmpiricalDistribution with the specified bin count using the provided RandomGenerator as the source of random data.

Parameters:
binCount - number of bins
generator - random data generator (may be null, resulting in default JDK generator)
Since:
3.0

EmpiricalDistribution

public EmpiricalDistribution(RandomGenerator generator)
Creates a new EmpiricalDistribution with default bin count using the provided RandomGenerator as the source of random data.

Parameters:
generator - random data generator (may be null, resulting in default JDK generator)
Since:
3.0

EmpiricalDistribution

public EmpiricalDistribution(int binCount,
                             RandomDataImpl randomData)
Creates a new EmpiricalDistribution with the specified bin count using the provided RandomDataImpl instance as the source of random data.

Parameters:
binCount - number of bins
randomData - random data generator (may be null, resulting in default JDK generator)
Since:
3.0

EmpiricalDistribution

public EmpiricalDistribution(RandomDataImpl randomData)
Creates a new EmpiricalDistribution with default bin count using the provided RandomDataImpl as the source of random data.

Parameters:
randomData - random data generator (may be null, resulting in default JDK generator)
Since:
3.0
Method Detail

load

public void load(double[] in)
          throws NullArgumentException
Computes the empirical distribution from the provided array of numbers.

Parameters:
in - the input data array
Throws:
NullArgumentException - if in is null

load

public void load(URL url)
          throws IOException,
                 NullArgumentException
Computes the empirical distribution using data read from a URL.

Parameters:
url - url of the input file
Throws:
IOException - if an IO error occurs
NullArgumentException - if url is null

load

public void load(File file)
          throws IOException,
                 NullArgumentException
Computes the empirical distribution from the input file.

Parameters:
file - the input file
Throws:
IOException - if an IO error occurs
NullArgumentException - if file is null

fillBinStats

private void fillBinStats(Object in)
                   throws IOException
Fills binStats array (second pass through data file).

Parameters:
in - object providing access to the data
Throws:
IOException - if an IO error occurs

findBin

private int findBin(double value)
Returns the index of the bin to which the given value belongs

Parameters:
value - the value whose bin we are trying to find
Returns:
the index of the bin containing the value

getNextValue

public double getNextValue()
                    throws MathIllegalStateException
Generates a random value from this distribution. Preconditions:

Returns:
the random value.
Throws:
MathIllegalStateException - if the distribution has not been loaded

getSampleStats

public StatisticalSummary getSampleStats()
Returns a StatisticalSummary describing this distribution. Preconditions:

Returns:
the sample statistics
Throws:
IllegalStateException - if the distribution has not been loaded

getBinCount

public int getBinCount()
Returns the number of bins.

Returns:
the number of bins.

getBinStats

public List<SummaryStatistics> getBinStats()
Returns a List of SummaryStatistics instances containing statistics describing the values in each of the bins. The list is indexed on the bin number.

Returns:
List of bin statistics.

getUpperBounds

public double[] getUpperBounds()

Returns a fresh copy of the array of upper bounds for the bins. Bins are:
[min,upperBounds[0]],(upperBounds[0],upperBounds[1]],..., (upperBounds[binCount-2], upperBounds[binCount-1] = max].

Note: In versions 1.0-2.0 of commons-math, this method incorrectly returned the array of probability generator upper bounds now returned by getGeneratorUpperBounds().

Returns:
array of bin upper bounds
Since:
2.1

getGeneratorUpperBounds

public double[] getGeneratorUpperBounds()

Returns a fresh copy of the array of upper bounds of the subintervals of [0,1] used in generating data from the empirical distribution. Subintervals correspond to bins with lengths proportional to bin counts.

In versions 1.0-2.0 of commons-math, this array was (incorrectly) returned by getUpperBounds().

Returns:
array of upper bounds of subintervals used in data generation
Since:
2.1

isLoaded

public boolean isLoaded()
Property indicating whether or not the distribution has been loaded.

Returns:
true if the distribution has been loaded

reSeed

public void reSeed(long seed)
Reseeds the random number generator used by getNextValue().

Parameters:
seed - random generator seed
Since:
3.0


Copyright (c) 2003-2013 Apache Software Foundation