|
Statistics.Sample | Portability | portable | Stability | experimental | Maintainer | bos@serpentine.com |
|
|
|
|
|
Description |
Commonly used sample statistics, also known as descriptive
statistics.
|
|
Synopsis |
|
|
|
|
Types
|
|
|
Sample data.
|
|
|
Sample with weights. First element of sample is data, second is weight
|
|
Descriptive functions
|
|
|
|
Statistics of location
|
|
|
Arithmetic mean. This uses Welford's algorithm to provide
numerical stability, using a single pass over the sample data.
|
|
|
Arithmetic mean for weighted sample. It uses algorithm analogous
to one in mean
|
|
|
Harmonic mean. This algorithm performs a single pass over the
sample.
|
|
|
Geometric mean of a sample containing no negative values.
|
|
Statistics of dispersion
|
|
The variance—and hence the standard deviation—of a
sample of fewer than two elements are both defined to be zero.
|
|
Functions over central moments
|
|
|
Compute the kth central moment of a sample. The central moment
is also known as the moment about the mean.
This function performs two passes over the sample, so is not subject
to stream fusion.
For samples containing many values very close to the mean, this
function is subject to inaccuracy due to catastrophic cancellation.
|
|
|
Compute the kth and jth central moments of a sample.
This function performs two passes over the sample, so is not subject
to stream fusion.
For samples containing many values very close to the mean, this
function is subject to inaccuracy due to catastrophic cancellation.
|
|
|
Compute the skewness of a sample. This is a measure of the
asymmetry of its distribution.
A sample with negative skew is said to be left-skewed. Most of
its mass is on the right of the distribution, with the tail on the
left.
skewness $ U.to [1,100,101,102,103]
==> -1.497681449918257
A sample with positive skew is said to be right-skewed.
skewness $ U.to [1,2,3,4,100]
==> 1.4975367033335198
A sample's skewness is not defined if its variance is zero.
This function performs two passes over the sample, so is not subject
to stream fusion.
For samples containing many values very close to the mean, this
function is subject to inaccuracy due to catastrophic cancellation.
|
|
|
Compute the excess kurtosis of a sample. This is a measure of
the "peakedness" of its distribution. A high kurtosis indicates
that more of the sample's variance is due to infrequent severe
deviations, rather than more frequent modest deviations.
A sample's excess kurtosis is not defined if its variance is
zero.
This function performs two passes over the sample, so is not subject
to stream fusion.
For samples containing many values very close to the mean, this
function is subject to inaccuracy due to catastrophic cancellation.
|
|
Two-pass functions (numerically robust)
|
|
These functions use the compensated summation algorithm of Chan et
al. for numerical robustness, but require two passes over the
sample data as a result.
Because of the need for two passes, these functions are not
subject to stream fusion.
|
|
|
Maximum likelihood estimate of a sample's variance. Also known
as the population variance, where the denominator is n.
|
|
|
Unbiased estimate of a sample's variance. Also known as the
sample variance, where the denominator is n-1.
|
|
|
Standard deviation. This is simply the square root of the
unbiased estimate of the variance.
|
|
|
Weighted variance. This is biased estimation.
|
|
Single-pass functions (faster, less safe)
|
|
The functions prefixed with the name fast below perform a single
pass over the sample data using Knuth's algorithm. They usually
work well, but see below for caveats. These functions are subject
to array fusion.
Note: in cases where most sample data is close to the sample's
mean, Knuth's algorithm gives inaccurate results due to
catastrophic cancellation.
|
|
|
Maximum likelihood estimate of a sample's variance.
|
|
|
Unbiased estimate of a sample's variance.
|
|
|
Standard deviation. This is simply the square root of the
maximum likelihood estimate of the variance.
|
|
References
|
|
- Chan, T. F.; Golub, G.H.; LeVeque, R.J. (1979) Updating formulae
and a pairwise algorithm for computing sample
variances. Technical Report STAN-CS-79-773, Department of
Computer Science, Stanford
University. ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/79/773/CS-TR-79-773.pdf
- Knuth, D.E. (1998) The art of computer programming, volume 2:
seminumerical algorithms, 3rd ed., p. 232.
- Welford, B.P. (1962) Note on a method for calculating corrected
sums of squares and products. Technometrics
4(3):419–420. http://www.jstor.org/stable/1266577
- West, D.H.D. (1979) Updating mean and variance estimates: an
improved method. Communications of the ACM
22(9):532–535. http://doi.acm.org/10.1145/359146.359153
|
|
Produced by Haddock version 2.4.2 |