org.apache.commons.math3.stat.descriptive
Class MultivariateSummaryStatistics

java.lang.Object
  extended by org.apache.commons.math3.stat.descriptive.MultivariateSummaryStatistics
All Implemented Interfaces:
Serializable, StatisticalMultivariateSummary
Direct Known Subclasses:
SynchronizedMultivariateSummaryStatistics

public class MultivariateSummaryStatistics
extends Object
implements StatisticalMultivariateSummary, Serializable

Computes summary statistics for a stream of n-tuples added using the addValue method. The data values are not stored in memory, so this class can be used to compute statistics for very large n-tuple streams.

The StorelessUnivariateStatistic instances used to maintain summary state and compute statistics are configurable via setters. For example, the default implementation for the mean can be overridden by calling setMeanImpl(StorelessUnivariateStatistic[]). Actual parameters to these methods must implement the StorelessUnivariateStatistic interface and configuration must be completed before addValue is called. No configuration is necessary to use the default, commons-math provided implementations.

To compute statistics for a stream of n-tuples, construct a MultivariateStatistics instance with dimension n and then use addValue(double[]) to add n-tuples. The getXxx methods where Xxx is a statistic return an array of double values, where for i = 0,...,n-1 the ith array element is the value of the given statistic for data range consisting of the ith element of each of the input n-tuples. For example, if addValue is called with actual parameters {0, 1, 2}, then {3, 4, 5} and finally {6, 7, 8}, getSum will return a three-element array with values {0+3+6, 1+4+7, 2+5+8}

Note: This class is not thread-safe. Use SynchronizedMultivariateSummaryStatistics if concurrent access from multiple threads is required.

Since:
1.2
Version:
$Id: MultivariateSummaryStatistics.java 1244107 2012-02-14 16:17:55Z erans $
See Also:
Serialized Form

Field Summary
private  VectorialCovariance covarianceImpl
          Covariance statistic implementation - cannot be reset.
private  StorelessUnivariateStatistic[] geoMeanImpl
          Geometric mean statistic implementation - can be reset by setter.
private  int k
          Dimension of the data.
private  StorelessUnivariateStatistic[] maxImpl
          Maximum statistic implementation - can be reset by setter.
private  StorelessUnivariateStatistic[] meanImpl
          Mean statistic implementation - can be reset by setter.
private  StorelessUnivariateStatistic[] minImpl
          Minimum statistic implementation - can be reset by setter.
private  long n
          Count of values that have been added
private static long serialVersionUID
          Serialization UID
private  StorelessUnivariateStatistic[] sumImpl
          Sum statistic implementation - can be reset by setter.
private  StorelessUnivariateStatistic[] sumLogImpl
          Sum of log statistic implementation - can be reset by setter.
private  StorelessUnivariateStatistic[] sumSqImpl
          Sum of squares statistic implementation - can be reset by setter.
 
Constructor Summary
MultivariateSummaryStatistics(int k, boolean isCovarianceBiasCorrected)
          Construct a MultivariateSummaryStatistics instance
 
Method Summary
 void addValue(double[] value)
          Add an n-tuple to the data
private  void append(StringBuilder buffer, double[] data, String prefix, String separator, String suffix)
          Append a text representation of an array to a buffer.
private  void checkDimension(int dimension)
          Throws DimensionMismatchException if dimension != k.
private  void checkEmpty()
          Throws MathIllegalStateException if the statistic is not empty.
 void clear()
          Resets all statistics and storage
 boolean equals(Object object)
          Returns true iff object is a MultivariateSummaryStatistics instance and all statistics have the same values as this.
 RealMatrix getCovariance()
          Returns the covariance matrix of the values that have been added.
 int getDimension()
          Returns the dimension of the data
 StorelessUnivariateStatistic[] getGeoMeanImpl()
          Returns the currently configured geometric mean implementation
 double[] getGeometricMean()
          Returns an array whose ith entry is the geometric mean of the ith entries of the arrays that have been added using addValue(double[])
 double[] getMax()
          Returns an array whose ith entry is the maximum of the ith entries of the arrays that have been added using addValue(double[])
 StorelessUnivariateStatistic[] getMaxImpl()
          Returns the currently configured maximum implementation
 double[] getMean()
          Returns an array whose ith entry is the mean of the ith entries of the arrays that have been added using addValue(double[])
 StorelessUnivariateStatistic[] getMeanImpl()
          Returns the currently configured mean implementation
 double[] getMin()
          Returns an array whose ith entry is the minimum of the ith entries of the arrays that have been added using addValue(double[])
 StorelessUnivariateStatistic[] getMinImpl()
          Returns the currently configured minimum implementation
 long getN()
          Returns the number of available values
private  double[] getResults(StorelessUnivariateStatistic[] stats)
          Returns an array of the results of a statistic.
 double[] getStandardDeviation()
          Returns an array whose ith entry is the standard deviation of the ith entries of the arrays that have been added using addValue(double[])
 double[] getSum()
          Returns an array whose ith entry is the sum of the ith entries of the arrays that have been added using addValue(double[])
 StorelessUnivariateStatistic[] getSumImpl()
          Returns the currently configured Sum implementation
 double[] getSumLog()
          Returns an array whose ith entry is the sum of logs of the ith entries of the arrays that have been added using addValue(double[])
 StorelessUnivariateStatistic[] getSumLogImpl()
          Returns the currently configured sum of logs implementation
 double[] getSumSq()
          Returns an array whose ith entry is the sum of squares of the ith entries of the arrays that have been added using addValue(double[])
 StorelessUnivariateStatistic[] getSumsqImpl()
          Returns the currently configured sum of squares implementation
 int hashCode()
          Returns hash code based on values of statistics
 void setGeoMeanImpl(StorelessUnivariateStatistic[] geoMeanImpl)
          Sets the implementation for the geometric mean.
private  void setImpl(StorelessUnivariateStatistic[] newImpl, StorelessUnivariateStatistic[] oldImpl)
          Sets statistics implementations.
 void setMaxImpl(StorelessUnivariateStatistic[] maxImpl)
          Sets the implementation for the maximum.
 void setMeanImpl(StorelessUnivariateStatistic[] meanImpl)
          Sets the implementation for the mean.
 void setMinImpl(StorelessUnivariateStatistic[] minImpl)
          Sets the implementation for the minimum.
 void setSumImpl(StorelessUnivariateStatistic[] sumImpl)
          Sets the implementation for the Sum.
 void setSumLogImpl(StorelessUnivariateStatistic[] sumLogImpl)
          Sets the implementation for the sum of logs.
 void setSumsqImpl(StorelessUnivariateStatistic[] sumsqImpl)
          Sets the implementation for the sum of squares.
 String toString()
          Generates a text report displaying summary statistics from values that have been added.
 
Methods inherited from class java.lang.Object
clone, finalize, getClass, notify, notifyAll, wait, wait, wait
 

Field Detail

serialVersionUID

private static final long serialVersionUID
Serialization UID

See Also:
Constant Field Values

k

private int k
Dimension of the data.


n

private long n
Count of values that have been added


sumImpl

private StorelessUnivariateStatistic[] sumImpl
Sum statistic implementation - can be reset by setter.


sumSqImpl

private StorelessUnivariateStatistic[] sumSqImpl
Sum of squares statistic implementation - can be reset by setter.


minImpl

private StorelessUnivariateStatistic[] minImpl
Minimum statistic implementation - can be reset by setter.


maxImpl

private StorelessUnivariateStatistic[] maxImpl
Maximum statistic implementation - can be reset by setter.


sumLogImpl

private StorelessUnivariateStatistic[] sumLogImpl
Sum of log statistic implementation - can be reset by setter.


geoMeanImpl

private StorelessUnivariateStatistic[] geoMeanImpl
Geometric mean statistic implementation - can be reset by setter.


meanImpl

private StorelessUnivariateStatistic[] meanImpl
Mean statistic implementation - can be reset by setter.


covarianceImpl

private VectorialCovariance covarianceImpl
Covariance statistic implementation - cannot be reset.

Constructor Detail

MultivariateSummaryStatistics

public MultivariateSummaryStatistics(int k,
                                     boolean isCovarianceBiasCorrected)
Construct a MultivariateSummaryStatistics instance

Parameters:
k - dimension of the data
isCovarianceBiasCorrected - if true, the unbiased sample covariance is computed, otherwise the biased population covariance is computed
Method Detail

addValue

public void addValue(double[] value)
Add an n-tuple to the data

Parameters:
value - the n-tuple to add
Throws:
DimensionMismatchException - if the length of the array does not match the one used at construction

getDimension

public int getDimension()
Returns the dimension of the data

Specified by:
getDimension in interface StatisticalMultivariateSummary
Returns:
The dimension of the data

getN

public long getN()
Returns the number of available values

Specified by:
getN in interface StatisticalMultivariateSummary
Returns:
The number of available values

getResults

private double[] getResults(StorelessUnivariateStatistic[] stats)
Returns an array of the results of a statistic.

Parameters:
stats - univariate statistic array
Returns:
results array

getSum

public double[] getSum()
Returns an array whose ith entry is the sum of the ith entries of the arrays that have been added using addValue(double[])

Specified by:
getSum in interface StatisticalMultivariateSummary
Returns:
the array of component sums

getSumSq

public double[] getSumSq()
Returns an array whose ith entry is the sum of squares of the ith entries of the arrays that have been added using addValue(double[])

Specified by:
getSumSq in interface StatisticalMultivariateSummary
Returns:
the array of component sums of squares

getSumLog

public double[] getSumLog()
Returns an array whose ith entry is the sum of logs of the ith entries of the arrays that have been added using addValue(double[])

Specified by:
getSumLog in interface StatisticalMultivariateSummary
Returns:
the array of component log sums

getMean

public double[] getMean()
Returns an array whose ith entry is the mean of the ith entries of the arrays that have been added using addValue(double[])

Specified by:
getMean in interface StatisticalMultivariateSummary
Returns:
the array of component means

getStandardDeviation

public double[] getStandardDeviation()
Returns an array whose ith entry is the standard deviation of the ith entries of the arrays that have been added using addValue(double[])

Specified by:
getStandardDeviation in interface StatisticalMultivariateSummary
Returns:
the array of component standard deviations

getCovariance

public RealMatrix getCovariance()
Returns the covariance matrix of the values that have been added.

Specified by:
getCovariance in interface StatisticalMultivariateSummary
Returns:
the covariance matrix

getMax

public double[] getMax()
Returns an array whose ith entry is the maximum of the ith entries of the arrays that have been added using addValue(double[])

Specified by:
getMax in interface StatisticalMultivariateSummary
Returns:
the array of component maxima

getMin

public double[] getMin()
Returns an array whose ith entry is the minimum of the ith entries of the arrays that have been added using addValue(double[])

Specified by:
getMin in interface StatisticalMultivariateSummary
Returns:
the array of component minima

getGeometricMean

public double[] getGeometricMean()
Returns an array whose ith entry is the geometric mean of the ith entries of the arrays that have been added using addValue(double[])

Specified by:
getGeometricMean in interface StatisticalMultivariateSummary
Returns:
the array of component geometric means

toString

public String toString()
Generates a text report displaying summary statistics from values that have been added.

Overrides:
toString in class Object
Returns:
String with line feeds displaying statistics

append

private void append(StringBuilder buffer,
                    double[] data,
                    String prefix,
                    String separator,
                    String suffix)
Append a text representation of an array to a buffer.

Parameters:
buffer - buffer to fill
data - data array
prefix - text prefix
separator - elements separator
suffix - text suffix

clear

public void clear()
Resets all statistics and storage


equals

public boolean equals(Object object)
Returns true iff object is a MultivariateSummaryStatistics instance and all statistics have the same values as this.

Overrides:
equals in class Object
Parameters:
object - the object to test equality against.
Returns:
true if object equals this

hashCode

public int hashCode()
Returns hash code based on values of statistics

Overrides:
hashCode in class Object
Returns:
hash code

setImpl

private void setImpl(StorelessUnivariateStatistic[] newImpl,
                     StorelessUnivariateStatistic[] oldImpl)
Sets statistics implementations.

Parameters:
newImpl - new implementations for statistics
oldImpl - old implementations for statistics
Throws:
DimensionMismatchException - if the array dimension does not match the one used at construction
IllegalStateException - if data has already been added (i.e if n > 0)

getSumImpl

public StorelessUnivariateStatistic[] getSumImpl()
Returns the currently configured Sum implementation

Returns:
the StorelessUnivariateStatistic implementing the sum

setSumImpl

public void setSumImpl(StorelessUnivariateStatistic[] sumImpl)

Sets the implementation for the Sum.

This method must be activated before any data has been added - i.e., before addValue has been used to add data; otherwise an IllegalStateException will be thrown.

Parameters:
sumImpl - the StorelessUnivariateStatistic instance to use for computing the Sum
Throws:
DimensionMismatchException - if the array dimension does not match the one used at construction
IllegalStateException - if data has already been added (i.e if n > 0)

getSumsqImpl

public StorelessUnivariateStatistic[] getSumsqImpl()
Returns the currently configured sum of squares implementation

Returns:
the StorelessUnivariateStatistic implementing the sum of squares

setSumsqImpl

public void setSumsqImpl(StorelessUnivariateStatistic[] sumsqImpl)

Sets the implementation for the sum of squares.

This method must be activated before any data has been added - i.e., before addValue has been used to add data; otherwise an IllegalStateException will be thrown.

Parameters:
sumsqImpl - the StorelessUnivariateStatistic instance to use for computing the sum of squares
Throws:
DimensionMismatchException - if the array dimension does not match the one used at construction
IllegalStateException - if data has already been added (i.e if n > 0)

getMinImpl

public StorelessUnivariateStatistic[] getMinImpl()
Returns the currently configured minimum implementation

Returns:
the StorelessUnivariateStatistic implementing the minimum

setMinImpl

public void setMinImpl(StorelessUnivariateStatistic[] minImpl)

Sets the implementation for the minimum.

This method must be activated before any data has been added - i.e., before addValue has been used to add data; otherwise an IllegalStateException will be thrown.

Parameters:
minImpl - the StorelessUnivariateStatistic instance to use for computing the minimum
Throws:
DimensionMismatchException - if the array dimension does not match the one used at construction
IllegalStateException - if data has already been added (i.e if n > 0)

getMaxImpl

public StorelessUnivariateStatistic[] getMaxImpl()
Returns the currently configured maximum implementation

Returns:
the StorelessUnivariateStatistic implementing the maximum

setMaxImpl

public void setMaxImpl(StorelessUnivariateStatistic[] maxImpl)

Sets the implementation for the maximum.

This method must be activated before any data has been added - i.e., before addValue has been used to add data; otherwise an IllegalStateException will be thrown.

Parameters:
maxImpl - the StorelessUnivariateStatistic instance to use for computing the maximum
Throws:
DimensionMismatchException - if the array dimension does not match the one used at construction
IllegalStateException - if data has already been added (i.e if n > 0)

getSumLogImpl

public StorelessUnivariateStatistic[] getSumLogImpl()
Returns the currently configured sum of logs implementation

Returns:
the StorelessUnivariateStatistic implementing the log sum

setSumLogImpl

public void setSumLogImpl(StorelessUnivariateStatistic[] sumLogImpl)

Sets the implementation for the sum of logs.

This method must be activated before any data has been added - i.e., before addValue has been used to add data; otherwise an IllegalStateException will be thrown.

Parameters:
sumLogImpl - the StorelessUnivariateStatistic instance to use for computing the log sum
Throws:
DimensionMismatchException - if the array dimension does not match the one used at construction
IllegalStateException - if data has already been added (i.e if n > 0)

getGeoMeanImpl

public StorelessUnivariateStatistic[] getGeoMeanImpl()
Returns the currently configured geometric mean implementation

Returns:
the StorelessUnivariateStatistic implementing the geometric mean

setGeoMeanImpl

public void setGeoMeanImpl(StorelessUnivariateStatistic[] geoMeanImpl)

Sets the implementation for the geometric mean.

This method must be activated before any data has been added - i.e., before addValue has been used to add data; otherwise an IllegalStateException will be thrown.

Parameters:
geoMeanImpl - the StorelessUnivariateStatistic instance to use for computing the geometric mean
Throws:
DimensionMismatchException - if the array dimension does not match the one used at construction
IllegalStateException - if data has already been added (i.e if n > 0)

getMeanImpl

public StorelessUnivariateStatistic[] getMeanImpl()
Returns the currently configured mean implementation

Returns:
the StorelessUnivariateStatistic implementing the mean

setMeanImpl

public void setMeanImpl(StorelessUnivariateStatistic[] meanImpl)

Sets the implementation for the mean.

This method must be activated before any data has been added - i.e., before addValue has been used to add data; otherwise an IllegalStateException will be thrown.

Parameters:
meanImpl - the StorelessUnivariateStatistic instance to use for computing the mean
Throws:
DimensionMismatchException - if the array dimension does not match the one used at construction
IllegalStateException - if data has already been added (i.e if n > 0)

checkEmpty

private void checkEmpty()
Throws MathIllegalStateException if the statistic is not empty.

Throws:
MathIllegalStateException - if n > 0.

checkDimension

private void checkDimension(int dimension)
Throws DimensionMismatchException if dimension != k.

Parameters:
dimension - dimension to check
Throws:
DimensionMismatchException - if dimension != k


Copyright (c) 2003-2013 Apache Software Foundation