org.apache.commons.math.stat.regression
Class OLSMultipleLinearRegression

java.lang.Object
  extended by org.apache.commons.math.stat.regression.AbstractMultipleLinearRegression
      extended by org.apache.commons.math.stat.regression.OLSMultipleLinearRegression
All Implemented Interfaces:
MultipleLinearRegression

public class OLSMultipleLinearRegression
extends AbstractMultipleLinearRegression

Implements ordinary least squares (OLS) to estimate the parameters of a multiple linear regression model.

OLS assumes the covariance matrix of the error to be diagonal and with equal variance.

u ~ N(0, σ2I)

The regression coefficients, b, satisfy the normal equations:

XT X b = XT y

To solve the normal equations, this implementation uses QR decomposition of the X matrix. (See QRDecompositionImpl for details on the decomposition algorithm.)

XTX b = XT y
(QR)T (QR) b = (QR)Ty
RT (QTQ) R b = RT QT y
RT R b = RT QT y
(RT)-1 RT R b = (RT)-1 RT QT y
R b = QT y

Given Q and R, the last equation is solved by back-subsitution.

Since:
2.0
Version:
$Revision: 783702 $ $Date: 2009-06-11 04:54:02 -0400 (Thu, 11 Jun 2009) $

Field Summary
private  QRDecomposition qr
          Cached QR decomposition of X matrix
 
Fields inherited from class org.apache.commons.math.stat.regression.AbstractMultipleLinearRegression
X, Y
 
Constructor Summary
OLSMultipleLinearRegression()
           
 
Method Summary
protected  RealVector calculateBeta()
          Calculates regression coefficients using OLS.
protected  RealMatrix calculateBetaVariance()
          Calculates the variance on the beta by OLS.
 RealMatrix calculateHat()
          Compute the "hat" matrix.
protected  double calculateYVariance()
          Calculates the variance on the Y by OLS.
private static void checkUpperTriangular(RealMatrix m, double epsilon)
          Check if a matrix is upper-triangular.
 void newSampleData(double[] y, double[][] x)
          Loads model x and y sample data, overriding any previous sample.
 void newSampleData(double[] data, int nobs, int nvars)
          Loads model x and y sample data from a flat array of data, overriding any previous sample.
protected  void newXSampleData(double[][] x)
          Loads new x sample data, overriding any previous sample
private static RealVector solveUpperTriangular(RealMatrix coefficients, RealVector constants)
          Uses back substitution to solve the system
 
Methods inherited from class org.apache.commons.math.stat.regression.AbstractMultipleLinearRegression
calculateResiduals, estimateRegressandVariance, estimateRegressionParameters, estimateRegressionParametersStandardErrors, estimateRegressionParametersVariance, estimateResiduals, newYSampleData, validateCovarianceData, validateSampleData
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

qr

private QRDecomposition qr
Cached QR decomposition of X matrix

Constructor Detail

OLSMultipleLinearRegression

public OLSMultipleLinearRegression()
Method Detail

newSampleData

public void newSampleData(double[] y,
                          double[][] x)
Loads model x and y sample data, overriding any previous sample. Computes and caches QR decomposition of the X matrix.

Parameters:
y - the [n,1] array representing the y sample
x - the [n,k] array representing the x sample
Throws:
java.lang.IllegalArgumentException - if the x and y array data are not compatible for the regression

newSampleData

public void newSampleData(double[] data,
                          int nobs,
                          int nvars)
Loads model x and y sample data from a flat array of data, overriding any previous sample. Assumes that rows are concatenated with y values first in each row. Computes and caches QR decomposition of the X matrix

Overrides:
newSampleData in class AbstractMultipleLinearRegression
Parameters:
data - input data array
nobs - number of observations (rows)
nvars - number of independent variables (columns, not counting y)

calculateHat

public RealMatrix calculateHat()

Compute the "hat" matrix.

The hat matrix is defined in terms of the design matrix X by X(XTX)-1XT

The implementation here uses the QR decomposition to compute the hat matrix as Q IpQT where Ip is the p-dimensional identity matrix augmented by 0's. This computational formula is from "The Hat Matrix in Regression and ANOVA", David C. Hoaglin and Roy E. Welsch, The American Statistician, Vol. 32, No. 1 (Feb., 1978), pp. 17-22.

Returns:
the hat matrix

newXSampleData

protected void newXSampleData(double[][] x)
Loads new x sample data, overriding any previous sample

Overrides:
newXSampleData in class AbstractMultipleLinearRegression
Parameters:
x - the [n,k] array representing the x sample

calculateBeta

protected RealVector calculateBeta()
Calculates regression coefficients using OLS.

Specified by:
calculateBeta in class AbstractMultipleLinearRegression
Returns:
beta

calculateBetaVariance

protected RealMatrix calculateBetaVariance()

Calculates the variance on the beta by OLS.

Var(b) = (XTX)-1

Uses QR decomposition to reduce (XTX)-1 to (RTR)-1, with only the top p rows of R included, where p = the length of the beta vector.

Specified by:
calculateBetaVariance in class AbstractMultipleLinearRegression
Returns:
The beta variance

calculateYVariance

protected double calculateYVariance()

Calculates the variance on the Y by OLS.

Var(y) = Tr(uTu)/(n - k)

Specified by:
calculateYVariance in class AbstractMultipleLinearRegression
Returns:
The Y variance

solveUpperTriangular

private static RealVector solveUpperTriangular(RealMatrix coefficients,
                                               RealVector constants)

Uses back substitution to solve the system

coefficients X = constants

coefficients must upper-triangular and constants must be a column matrix. The solution is returned as a column matrix.

The number of columns in coefficients determines the length of the returned solution vector (column matrix). If constants has more rows than coefficients has columns, excess rows are ignored. Similarly, extra (zero) rows in coefficients are ignored

Parameters:
coefficients - upper-triangular coefficients matrix
constants - column RHS constants vector
Returns:
solution matrix as a column vector

checkUpperTriangular

private static void checkUpperTriangular(RealMatrix m,
                                         double epsilon)

Check if a matrix is upper-triangular.

Makes sure all below-diagonal elements are within epsilon of 0.

Parameters:
m - matrix to check
epsilon - maximum allowable absolute value for elements below the main diagonal
Throws:
java.lang.IllegalArgumentException - if m is not upper-triangular


Copyright (c) 2003-2010 Apache Software Foundation