smooth.spline {modreg} | R Documentation |
Fits a cubic smoothing spline to the supplied data.
smooth.spline(x, y, w = rep(1, length(x)), df = 5, spar = 0, cv = FALSE, all.knots = FALSE, df.offset = 0, penalty = 1)
x |
a vector giving the values of the predictor variable, or a list or a two-column matrix specifying x and y. |
y |
responses. If y is missing, the responses are assumed
to be specified by x . |
w |
optional vector of weights |
df |
the desired equivalent number of degrees of freedom (trace of the smoother matrix). |
spar |
smoothing parameter, typically in (0,1]. The
coefficient λ of the integral of the squared second
derivative in the fit (penalized log likelihood) criterion is a
monotone function of spar , see the details below. |
cv |
ordinary (TRUE ) or `generalized' (FALSE )
cross-validation. |
all.knots |
if TRUE , all points in x are uses as
knots. If FALSE , a suitably fine grid of knots is used. |
df.offset |
allows the degrees of freedom to be increased by
df.offset in the GCV criterion. |
penalty |
the coefficient of the penalty for degrees of freedom in the GCV criterion. |
The x
vector should contain at least ten distinct values.
The computational λ used (as a function of
spar
) is
lambda = r * 256^(3*spar - 1)
where
r = tr(X' W^2 X) / tr(Σ),
Σ is the matrix given by
Sigma[i,j] = Integral B''[i](t) B''[j](t) dt,
X is given by X[i,j] = B[j](x[i]),
W^2 is the diagonal matrix of scaled weights, W =
diag(w)/n
(i.e., the identity for default weights),
and B[k](.) is the k-th B-spline.
Note that with these definitions, f_i = f(x_i), and the B-spline basis representation f = X c (i.e. c is the vector of spline coefficients), the penalized log likelihood is L = (y - f)' W^2 (y - f) + λ c' Σ c, and hence c is the solution of the (ridge regression) (X' W^2 X + λ Σ) c = X' W^2 y.
If spar
is missing or 0, the value of df
is used to
determine the degree of smoothing. If both are missing, leave-one-out
cross-validation is used to determine λ.
The `generalized' cross-validation method will work correctly when
there are duplicated points in x
. However, it is ambiguous what
leave-one-out cross-validation means with duplicated points, and the
internal code uses an approximation that involves leaving out groups
of duplicated points. cv=TRUE
is best avoided in that case.
An object of class "smooth.spline"
with components
x |
the distinct x values in increasing order. |
y |
the fitted values corresponding to x . |
w |
the weights used at the unique values of x . |
yin |
the y values used at the unique y values. |
lev |
leverages, the diagonal values of the smoother matrix. |
cv.crit |
(generalized) cross-validation score. |
pen.crit |
penalized criterion |
df |
equivalent degrees of freedom used. |
spar |
the value of λ chosen. |
fit |
list for use by predict.smooth.spline . |
call |
the matched call. |
B.D. Ripley
data(cars) attach(cars) plot(speed, dist, main = "data(cars) & smoothing splines") cars.spl <- smooth.spline(speed, dist) (cars.spl) ## This example has duplicate points, so avoid cv=TRUE lines(cars.spl, col = "blue") lines(smooth.spline(speed, dist, df=10), lty=2, col = "red") legend(5,120,c(paste("default [C.V.] => df =",round(cars.spl$df,1)), "s( * , df = 10)"), col = c("blue","red"), lty = 1:2, bg='bisque') detach()