smooth.spline {modreg} | R Documentation |
Fits a cubic smoothing spline to the supplied data.
smooth.spline(x, y, w = rep(1, length(x)), df = 5, spar = NULL, cv = FALSE, all.knots = FALSE, df.offset = 0, penalty = 1, control.spar = list())
x |
a vector giving the values of the predictor variable, or a list or a two-column matrix specifying x and y. |
y |
responses. If y is missing, the responses are assumed
to be specified by x . |
w |
optional vector of weights |
df |
the desired equivalent number of degrees of freedom (trace of the smoother matrix). |
spar |
smoothing parameter, typically (but not necessarily) in
(0,1]. The coefficient λ of the integral of the
squared second derivative in the fit (penalized log likelihood)
criterion is a monotone function of spar , see the details
below. |
cv |
ordinary (TRUE ) or `generalized' (FALSE )
cross-validation. |
all.knots |
if TRUE , all points in x are uses as
knots. If FALSE , a suitably fine grid of knots is used. |
df.offset |
allows the degrees of freedom to be increased by
df.offset in the GCV criterion. |
penalty |
the coefficient of the penalty for degrees of freedom in the GCV criterion. |
control.spar |
optional list with named components controlling the
root finding when the smoothing parameter spar is computed.
Note that this is partly experimental and may change with general spar computation improvements!
spar is only searched for in the interval
[low, high].
|
The x
vector should contain at least ten distinct values.
The computational λ used (as a function of
spar
) is
lambda = r * 256^(3*spar - 1)
where
r = tr(X' W^2 X) / tr(Σ),
Σ is the matrix given by
Sigma[i,j] = Integral B''[i](t) B''[j](t) dt,
X is given by X[i,j] = B[j](x[i]),
W^2 is the diagonal matrix of scaled weights, W =
diag(w)/n
(i.e., the identity for default weights),
and B[k](.) is the k-th B-spline.
Note that with these definitions, f_i = f(x_i), and the B-spline basis representation f = X c (i.e. c is the vector of spline coefficients), the penalized log likelihood is L = (y - f)' W^2 (y - f) + λ c' Σ c, and hence c is the solution of the (ridge regression) (X' W^2 X + λ Σ) c = X' W^2 y.
If spar
is missing or NULL
, the value of df
is used to
determine the degree of smoothing. If both are missing, leave-one-out
cross-validation is used to determine λ.
Note that from the above relation,
spar
is spar = s0 + 0.0601 * log(lambda),
which is intentionally different from the S-plus implementation
of smooth.spline
(where spar
is proportional to
λ). In R's (log λ) scale, it makes more
sense to vary spar
linearly.
Note however that currently the results may be come very unreliable
for spar
values smaller than about -1 or -2. The same may
happen for values larger than 2 or so. Don't think of setting
spar
or the controls low
and high
outside such a
safe range, unless you know what you are doing!
The ``generalized'' cross-validation method will work correctly when
there are duplicated points in x
. However, it is ambiguous what
leave-one-out cross-validation means with duplicated points, and the
internal code uses an approximation that involves leaving out groups
of duplicated points. cv=TRUE
is best avoided in that case.
An object of class "smooth.spline"
with components
x |
the distinct x values in increasing order. |
y |
the fitted values corresponding to x . |
w |
the weights used at the unique values of x . |
yin |
the y values used at the unique y values. |
lev |
leverages, the diagonal values of the smoother matrix. |
cv.crit |
(generalized) cross-validation score. |
pen.crit |
penalized criterion |
crit |
the criterion value minimized in the underlying
.Fortran routine `sslvrg'. |
df |
equivalent degrees of freedom used. Note that (currently)
this value may become quite unprecise when the true df is
between and 1 and 2.
|
spar |
the value of spar computed or given. |
lambda |
the value of λ corresponding to spar ,
see the details above. |
iparms |
named integer(3) vector where ..$ipars["iter"]
gives number of spar computing iterations used. |
fit |
list for use by predict.smooth.spline . |
call |
the matched call. |
B.D. Ripley and Martin Maechler (spar/lambda, etc).
data(cars) attach(cars) plot(speed, dist, main = "data(cars) & smoothing splines") cars.spl <- smooth.spline(speed, dist) (cars.spl) ## This example has duplicate points, so avoid cv=TRUE lines(cars.spl, col = "blue") lines(smooth.spline(speed, dist, df=10), lty=2, col = "red") legend(5,120,c(paste("default [C.V.] => df =",round(cars.spl$df,1)), "s( * , df = 10)"), col = c("blue","red"), lty = 1:2, bg='bisque') detach() ##-- artificial example y18 <- c(1:3,5,4,7:3,2*(2:5),rep(10,4)) xx <- seq(1,length(y18), len=201) (s2 <- smooth.spline(y18)) # GCV (s02 <- smooth.spline(y18, spar = 0.2)) plot(y18, main=deparse(s2$call), col.main=2) lines(s2, col = "gray"); lines(predict(s2, xx), col = 2) lines(predict(s02, xx), col = 3); mtext(deparse(s02$call), col = 3) ## The following shows the problematic behavior of `spar' searching: (s2 <- smooth.spline(y18, con=list(trace=TRUE,tol=1e-6, low= -1.5))) (s2m <- smooth.spline(y18, cv=TRUE, con=list(trace=TRUE,tol=1e-6, low= -1.5))) ## both above do quite similarly (Df = 8.5 +- 0.2)