Question

这是我跑的代码

fun <- function(x) {1 + 3*sin(4*pi*x-pi)}
set.seed(1)
num.samples <- 1000
x <- runif(num.samples)
y <- fun(x) + rnorm(num.samples) * 1.5
fit <- smooth.spline(x, y, all.knots=TRUE, df=3)

尽管df=3，当我检查拟合模型时，输出是

Call:
smooth.spline(x = x, y = y, df = 3, all.knots = TRUE)
Smoothing Parameter  spar= 1.499954  lambda= 0.002508571 (26 iterations)
Equivalent Degrees of Freedom (Df): 9.86422

有人可以帮忙吗？谢谢！

Answer 1

请注意，从R-3.4.0（2017-04-21），smooth.spline可以通过新添加的参数λ接受lambda的直接指定。但在估算期间它仍将转换为内部spar。所以以下答案不受影响。

平滑参数λ / spar位于平滑度控制的中心

平滑度由平滑参数λ控制。smooth.spline()使用内部平滑参数spar而不是λ：

spar = s0 + 0.0601 * log(λ)

这种对数变换对于进行无约束最小化是必要的，如GCV / CV。用户可以指定spar来间接指定λ。当spar线性增长时，λ将呈指数级增长。因此，很少需要使用大的spar值。

自由度df也是根据λ：

定义的

其中X是具有B样条基础的模型矩阵，S是惩罚矩阵。

您可以检查他们与您的数据集的关系：

spar <- seq(1, 2.5, by = 0.1)
a <- sapply(spar, function (spar_i) unlist(smooth.spline(x, y, all.knots=TRUE, spar = spar_i)[c("df","lambda")]))

让我们草绘df ~ spar，λ ~ spar和log(λ) ~ spar：

par(mfrow = c(1,3))
plot(spar, a[1, ], type = "b", main = "df ~ spar",
     xlab = "spar", ylab = "df")
plot(spar, a[2, ], type = "b", main = "lambda ~ spar",
     xlab = "spar", ylab = "lambda")
plot(spar, log(a[2,]), type = "b", main = "log(lambda) ~ spar",
     xlab = "spar", ylab = "log(lambda)")

请注意λ与spar的激进增长，log(λ)与spar之间的线性关系，以及df与{{之间相对平稳的关系1}}。

spar适合smooth.spline()

的迭代

如果我们手动指定spar的值，就像我们在spar中所做的那样，没有选择迭代来选择sapply();否则spar需要迭代一些smooth.spline()值。如果我们

指定spar，拟合迭代旨在最小化CV / GCV得分;
指定cv = TRUE / FALSE，拟合迭代旨在最小化df = mydf。

最大限度地减少GCV很容易遵循。我们不关心GCV得分，但关注相应的(df(spar) - mydf) ^ 2。相反，在最小化spar时，我们经常关注迭代结束时的(df(spar) - mydf)^2值，而不是df！但请记住，这是一个最小化问题，我们绝不保证最终spar与我们的目标值df匹配。

为什么放置mydf，但获得df = 3

迭代结束可能意味着达到最小值，或达到搜索边界，或达到最大迭代次数。

我们远离最大迭代限制（默认为500）;但我们没有达到最低限度。好吧，我们可能会到达边界。

不要专注于df = 9.864?，请考虑df。

spar

根据smooth.spline(x, y, all.knots=TRUE, df=3)$spar # 1.4999，默认情况下，?smooth.spline在smooth.spline()之间搜索spar。即，当您放置[-1.5, 1.5]时，最小化将终止于搜索边界，而不是点击df = 3。

再次查看我们关于df = 3和df之间关系的图表。从图中可以看出，我们需要一些接近2的spar值才能生成spar。

让我们使用df = 3参数：

control.spar

现在你看，你最终得到了fit <- smooth.spline(x, y, all.knots=TRUE, df=3, control.spar = list(high = 2.5)) # Smoothing Parameter spar= 1.859066 lambda= 0.9855336 (14 iterations) # Equivalent Degrees of Freedom (Df): 3.000305。我们需要一个df = 3。

更好的建议：请勿使用spar = 1.86

看，你有1000个数据。使用all.knots = TRUE，您将使用1000个参数。希望以all.knots = TRUE结束意味着1000个参数中的997个被抑制。想象一下df = 3因此需要λ多大！

请尝试使用惩罚回归样条。将200个参数抑制为3肯定要容易得多：

spar

现在，您最终得到的fit <- smooth.spline(x, y, nknots = 200, df=3) ## using 200 knots # Smoothing Parameter spar= 1.317883 lambda= 0.9853648 (16 iterations) # Equivalent Degrees of Freedom (Df): 3.000386没有df = 3控件。

smooth.spline（）：拟合模型与用户指定的自由度不匹配

1 个答案: