mgcv {mgcv} | R Documentation |
Function to efficiently estimate smoothing parameters in Generalized Ridge Regression Problem with multiple (quadratic) penalties, by GCV or UBRE. The function uses Newton's method in multi-dimensions, backed up by steepest descent to iteratively adjust a set of relative smoothing parameters for each penalty. To ensure that the overall level of smoothing is optimal, and to guard against trapping by local minima, a highly efficient global minimisation with respect to one overall smoothing parameter is also made at each iteration.
The other functions in the mgcv
package are listed under `See Also'.
mgcv(M)
M |
is the single argument to mgcv and should be a list
with the elements listed below: |
M$y |
The response data vector. |
M$w |
A vector of weights for the data (often proportional to the reciprocal of the variance). |
M$X |
The design matrix for the problem, note that ncol(M$X)
must give the number of model parameters, while nrow(M$X)
should give the number of data. |
M$C |
Matrix containing any linear equality constraints on the problem (i.e. C in Cp=0). |
M$S |
A one dimensional array containing the non-zero elements of the
penalty matrices. Let start<-sum(M$df[1:(k-1)]^2) or
start<-0 if k==1 . Then penalty matrix k has
M$S[start+i+M$df[i]*(j-1) on its ith row and jth column.
The kth full penalty matrix is constructed by placing the matrix just
derived into an appropriate matrix of zeroes with it's 1,1 element at
M$off[k],M$off[k]
|
M$m |
The number of penalty terms that are to be applied. |
M$off |
Offset values locating the elements of M$S[] in the correct location within each penalty coefficient matrix. indexing starts at zero. |
M$fix |
M$fix[i] is FALSE if the corresponding smoothing parameter
is to be estimated, or TRUE if it is to be fixed at zero. |
M$df |
M$df[i] gives the number of rows and columns of
M$S[i] that are non-zero. |
M$sig2 |
This serves 2 purposes. If it is strictly positive then UBRE is
used with sig2 taken as the known error variance. If it is
non-positive then GCV is used (and on return it will contain an estimate of the error variance).
Note that it is assumed that the variance of y_i is
given
by sig2 /w_i. |
M$sp |
An array of initial smoothing parameter estimates if any of
these is not strictly positive then automatic generation of initial values
is used. If M$fixed.sp is non-zero then these parameters will used as known smoothing parameters. |
M$fixed.sp |
Non-zero to indicate that all smoothing parameters are supplied as known, and are not to be estimated. |
M$conv.tol |
The convergence tolerence to use for the multiple smoothing parameter search. 1e-6 is usually ok. |
M$max.half |
Each step of the GCV minimisation is repeatedly halved if it fails to improve the GCV score. This is the maximum number of halvings to try at each step. 15 is usually enough. |
M$min.edf |
The minimum estimated degrees of freedom for the model: set to less than or equal to 0 to ignore. Supplying this quantity helps to maintain numerical stability in cases in which the best model is actually very smooth. |
M$target.edf |
Set this to negative for it to be ignored. If it is positive then this is taken as the target edf to use for the model - if the GCV/UBRE function has multiple minima w.r.t. the overall smoothing parameter then the minimum closest to the target edf will be selected. This feature is primarily useful when selecting smoothing parameters as part of iterative least squares methods - the GCV/UBRE score can develop spurious minima in these cases that are not representative of the problem being approximated. |
This is documentation for the code implementing the method described in section 4 of Wood (2000) . The method is a computationally efficient means of applying GCV to the problem of smoothing parameter selection in generalized ridge regression problems of the form:
minimise || W^0.5 (Xp-y) ||^2 rho + lambda_1 p'S_1 p + lambda_1 p'S_2 p + . . .
possibly subject to constraints Cp=0. X is a design matrix, p a parameter vector, y a data vector, W a diagonal weight matrix, S_i a positive semi-definite matrix of coefficients defining the ith penalty and C a matrix of coefficients defining any linear equality constraints on the problem. The smoothing parameters are the lambda_i but there is an overall smoothing parameter rho as well. Note that X must be of full column rank, at least when projected into the null space of any equality constraints.
The method operates by alternating very efficient direct searches for
rho
with Newton or steepest descent updates of the logs of the lambda_i.
Because the GCV/UBRE scores are flat w.r.t. very large or very small lambda_i,
it's important to get good starting parameters, and to be careful not to step into a flat region
of the smoothing parameter space. For this reason the algorithm rescales any Newton step that
would result in a log(lambda_i) change of more than 5. Newton steps are only used
if the Hessian of the GCV/UBRE is postive definite, otherwise steepest descent is used. Similarly steepest
descent is used if the Newton step has to be contracted too far (indicating that the quadratic model
underlying Newton is poor). All initial steepest descent steps are scaled so that their largest component is
1. However a step is calculated, it is never expanded if it is successful (to avoid flat portions of the objective),
but steps are successively halved if they do not decrease the GCV/UBRE score, until they do, or the direction is deemed to have
failed. M$conv
provides some convergence diagnostics.
The method is coded in C
and is intended to be portable. It should be
noted that seriously ill conditioned problems (i.e. with close to column rank
deficiency in the design matrix) may cause problems, especially if weights vary
wildly between observations.
The function returns a list of the same form as the input list with the following elements added/modified:
M$p |
The best fit parameters given the estimated smoothing parameters. |
M$sp |
The estimated smoothing parameters (lambda_i/rho) |
M$sig2 |
The estimated (GCV) or supplied (UBRE) error variance. |
M$edf |
The estimated degrees of freedom associated with each penalized term. If p_i is the vector of parameters penalized by the ith penalty then we can write p_i = P_i y, and the edf for the ith term is tr(XP_i)... while this is sensible in the non-overlapping penalty case (e.g. GAMs) it makes alot less sense if penalties overlap! |
M$Vp |
Estimated covariance matrix of model parameters. |
M$gcv.ubre |
The minimized GCV/UBRE score. |
M$conv |
A list of convergence diagnostics, with the following elements:
M$y .g above - i.e. the leading diagonal of the Hessian.TRUE if the second smoothing parameter guess improved the GCV/UBRE score. (Please report examples
where this is FALSE )TRUE if the algorithm terminated by failing to improve the GCV/UBRE score rather than by "converging".
Not necessarily a problem, but check the above derivative information quite carefully. |
The method may not behave well with near column rank deficient X especially in contexts where the weights vary wildly.
Simon N. Wood simon@stats.gla.ac.uk
Gu and Wahba (1991) Minimizing GCV/GML scores with multiple smoothing parameters via the Newton method. SIAM J. Sci. Statist. Comput. 12:383-398
Wood (2000) Modelling and Smoothing Parameter Estimation with Multiple Quadratic Penalties. JRSSB 62(2):413-428
http://www.stats.gla.ac.uk/~simon/
gam
,
s
,
plot.gam
,
print.gam
,
summary.gam
,
predict.gam
,
persp.gam
,
residuals.gam
,
gam.models
,
gam.selection
,
gam.control
gam.check
pcls
,
mono.con
GAMsetup
,
gam.parser
,
gam.fit
,
gam.nbut
,
gam.setup
,
QT
,
get.family
,
neg.binom
,
null.space.dimension
,
theta.maxl
,
uniquecombs
set.seed(0) n<-100 # number of observations to simulate x <- runif(4 * n, 0, 1) # simulate covariates x <- array(x, dim = c(4, n)) # put into array for passing to GAMsetup pi <- asin(1) * 2 # begin simulating some data y <- 2 * sin(pi * x[1, ]) y <- y + exp(2 * x[2, ]) - 3.75887 y <- y + 0.2 * x[3, ]^11 * (10 * (1 - x[3, ]))^6 + 10 * (10 * x[3, ])^3 * (1 - x[3, ])^10 - 1.396 sig2<- -1 # set magnitude of variance e <- rnorm(n, 0, sqrt(abs(sig2))) y <- y + e # simulated data w <- matrix(1, n, 1) # weight matrix par(mfrow = c(2, 2)) # scatter plots of simulated data plot(x[1, ], y);plot(x[2, ], y) plot(x[3, ], y);plot(x[4, ], y) # create list for passing to GAMsetup G <- list(m = 4, n = n, nsdf = 0, df = c(15, 15, 15, 15),dim=c(1,1,1,1), s.type=c(0,0,0,0),by=0,by.exists=c(FALSE,FALSE,FALSE,FALSE), p.order=c(0,0,0,0), x = x,n.knots=rep(0,4)) H <- GAMsetup(G) H$y <- y # add data to H H$sig2 <- sig2 # add variance (signalling GCV use in this case) to H H$w <- w # add weights to H H$sp<-array(-1,H$m) H$fix<-array(FALSE,H$m) H$conv.tol<-1e-6;H$max.half<-15 H$min.edf<-5;H$fixed.sp<-0 H <- mgcv(H) # select smoothing parameters and fit model