statistics-0.6.0.2: A library of statistical types, data, and functionsSource codeContentsIndex
Statistics.Sample
Portabilityportable
Stabilityexperimental
Maintainerbos@serpentine.com
Contents
Types
Descriptive functions
Statistics of location
Statistics of dispersion
Functions over central moments
Two-pass functions (numerically robust)
Single-pass functions (faster, less safe)
References
Description
Commonly used sample statistics, also known as descriptive statistics.
Synopsis
type Sample = Vector Double
type WeightedSample = Vector (Double, Double)
range :: Vector v Double => v Double -> Double
mean :: Vector v Double => v Double -> Double
meanWeighted :: Vector v (Double, Double) => v (Double, Double) -> Double
harmonicMean :: Vector v Double => v Double -> Double
geometricMean :: Vector v Double => v Double -> Double
centralMoment :: Vector v Double => Int -> v Double -> Double
centralMoments :: Vector v Double => Int -> Int -> v Double -> (Double, Double)
skewness :: Vector v Double => v Double -> Double
kurtosis :: Vector v Double => v Double -> Double
variance :: Vector v Double => v Double -> Double
varianceUnbiased :: Vector v Double => v Double -> Double
stdDev :: Vector v Double => v Double -> Double
varianceWeighted :: Vector v (Double, Double) => v (Double, Double) -> Double
fastVariance :: Vector v Double => v Double -> Double
fastVarianceUnbiased :: Vector v Double => v Double -> Double
fastStdDev :: Vector v Double => v Double -> Double
Types
type Sample = Vector DoubleSource
Sample data.
type WeightedSample = Vector (Double, Double)Source
Sample with weights. First element of sample is data, second is weight
Descriptive functions
range :: Vector v Double => v Double -> DoubleSource
Statistics of location
mean :: Vector v Double => v Double -> DoubleSource
Arithmetic mean. This uses Welford's algorithm to provide numerical stability, using a single pass over the sample data.
meanWeighted :: Vector v (Double, Double) => v (Double, Double) -> DoubleSource
Arithmetic mean for weighted sample. It uses algorithm analogous to one in mean
harmonicMean :: Vector v Double => v Double -> DoubleSource
Harmonic mean. This algorithm performs a single pass over the sample.
geometricMean :: Vector v Double => v Double -> DoubleSource
Geometric mean of a sample containing no negative values.
Statistics of dispersion
The variance—and hence the standard deviation—of a sample of fewer than two elements are both defined to be zero.
Functions over central moments
centralMoment :: Vector v Double => Int -> v Double -> DoubleSource

Compute the kth central moment of a sample. The central moment is also known as the moment about the mean.

This function performs two passes over the sample, so is not subject to stream fusion.

For samples containing many values very close to the mean, this function is subject to inaccuracy due to catastrophic cancellation.

centralMoments :: Vector v Double => Int -> Int -> v Double -> (Double, Double)Source

Compute the kth and jth central moments of a sample.

This function performs two passes over the sample, so is not subject to stream fusion.

For samples containing many values very close to the mean, this function is subject to inaccuracy due to catastrophic cancellation.

skewness :: Vector v Double => v Double -> DoubleSource

Compute the skewness of a sample. This is a measure of the asymmetry of its distribution.

A sample with negative skew is said to be left-skewed. Most of its mass is on the right of the distribution, with the tail on the left.

 skewness $ U.to [1,100,101,102,103]
 ==> -1.497681449918257

A sample with positive skew is said to be right-skewed.

 skewness $ U.to [1,2,3,4,100]
 ==> 1.4975367033335198

A sample's skewness is not defined if its variance is zero.

This function performs two passes over the sample, so is not subject to stream fusion.

For samples containing many values very close to the mean, this function is subject to inaccuracy due to catastrophic cancellation.

kurtosis :: Vector v Double => v Double -> DoubleSource

Compute the excess kurtosis of a sample. This is a measure of the "peakedness" of its distribution. A high kurtosis indicates that more of the sample's variance is due to infrequent severe deviations, rather than more frequent modest deviations.

A sample's excess kurtosis is not defined if its variance is zero.

This function performs two passes over the sample, so is not subject to stream fusion.

For samples containing many values very close to the mean, this function is subject to inaccuracy due to catastrophic cancellation.

Two-pass functions (numerically robust)

These functions use the compensated summation algorithm of Chan et al. for numerical robustness, but require two passes over the sample data as a result.

Because of the need for two passes, these functions are not subject to stream fusion.

variance :: Vector v Double => v Double -> DoubleSource
Maximum likelihood estimate of a sample's variance. Also known as the population variance, where the denominator is n.
varianceUnbiased :: Vector v Double => v Double -> DoubleSource
Unbiased estimate of a sample's variance. Also known as the sample variance, where the denominator is n-1.
stdDev :: Vector v Double => v Double -> DoubleSource
Standard deviation. This is simply the square root of the unbiased estimate of the variance.
varianceWeighted :: Vector v (Double, Double) => v (Double, Double) -> DoubleSource
Weighted variance. This is biased estimation.
Single-pass functions (faster, less safe)

The functions prefixed with the name fast below perform a single pass over the sample data using Knuth's algorithm. They usually work well, but see below for caveats. These functions are subject to array fusion.

Note: in cases where most sample data is close to the sample's mean, Knuth's algorithm gives inaccurate results due to catastrophic cancellation.

fastVariance :: Vector v Double => v Double -> DoubleSource
Maximum likelihood estimate of a sample's variance.
fastVarianceUnbiased :: Vector v Double => v Double -> DoubleSource
Unbiased estimate of a sample's variance.
fastStdDev :: Vector v Double => v Double -> DoubleSource
Standard deviation. This is simply the square root of the maximum likelihood estimate of the variance.
References
Produced by Haddock version 2.4.2