我在R中使用smooth.spline()
函数时遇到了麻烦。
预期的行为:sm <- smooth.spline(data$x, data$y)
应该创建一个对象,其中用于构建它的所有数据都作为模型的一部分返回;例如sm$x
和sm$y
的长度应等于data$x
和data$y
。从新数据集进行的进一步预测应以相同的方式进行操作,以使从预测函数返回的向量的长度与用于预测的数据的长度相同。
观察到的行为:smooth.spline()
不会保留用于构建数据的所有数据,也不会为每个观察结果创建预测。
我敢肯定,我在这里忽略了一些愚蠢的东西,但是我无法弄清楚。
可复制的示例:
##create data set for smooth.spline model
y<-c(24.5292685, 24.5292685, 24.5292685, 24.5292685, 20.6326320, 20.6326320, 20.6326320, 22.3128140, 22.3128140, 22.3128140, 22.3128140, 22.3128140, 20.4381464, 20.4381464,
20.4381464, 20.7528465, 20.7528465, 20.7528465, 22.8431767, 22.8431767, 22.8431767, 22.8431767, 22.8431767, 21.5483605, 21.5483605, 16.3262066, 16.3262066, 16.3262066,
16.3262066, 16.3262066, 18.4767226, 18.4767226, 18.4767226, 7.6244328, 7.6244328, 7.6244328, -3.9726025, -3.9726025, -3.9726025, -3.9726025, -3.9726025, -3.9726025,
1.2090523, 1.2090523, 1.2090523, 19.5545305, 19.5545305, 19.5545305, 20.2714809, 20.5635124, 20.5635124, 18.0657058, 18.0657058, 18.0657058, 17.5258674, 13.6653809,
13.6653809, 13.6653809, 8.8839465, 8.8839465, 8.2448148, 8.2448148, 8.2448148, 8.7948831, 8.7948831, 2.5137371, 8.6105971, 8.6105971, 14.7650620, 14.7650620,
14.8774259, 14.8774259, 16.9564789, 20.2428563, 20.2428563, 20.8039368, 20.8039368, 21.4189956, 21.4189956, 16.5872965, 16.5872965, 13.1912207, 12.6576378, 12.6576378,
5.2589847, 5.2589847, -0.1451702, 11.0045202, 11.0045202, 10.8005181, 10.8005181, 15.6980449, 15.6980449, 14.9910402, 14.9910402, 18.7319224, 18.7319224, 18.3692496,
18.3692496, 18.3692496, 9.2911955, 9.2911955, 20.3091283, 20.3091283, 21.9794811, 21.9794811, 21.9794811, 21.0654857, 21.0654857, 19.7033713, 19.7033713, 19.7033713,
18.9868848, 18.9868848, 3.9505643, 3.9505643, 3.9505643, 7.1717265, 7.1717265, 6.4622461, 6.4622461, 6.4622461, 9.5197519, 9.5197519, 9.5197519, 15.1281635,
20.7016476, 20.7016476, 20.7016476, 20.1041949, 19.5109579, 19.5109579, 22.5606698, 22.5606698, 22.5606698, 22.1459763, 17.9437652, 19.0612696,
10.5874702, 10.5874702, -0.5577768, -0.5577768, -2.4302378, -0.7752072, -0.7752072)
date<-as.Date(c(
"2013-07-03", "2013-07-03", "2013-07-03", "2013-07-03", "2013-07-12", "2013-07-12", "2013-07-12", "2013-07-19", "2013-07-19", "2013-07-19", "2013-07-19", "2013-07-19",
"2013-07-28", "2013-07-28", "2013-07-28", "2013-08-13", "2013-08-13", "2013-08-13", "2013-08-20", "2013-08-20", "2013-08-20", "2013-08-20", "2013-08-20", "2013-09-05",
"2013-09-05", "2013-09-21", "2013-09-21", "2013-09-21", "2013-09-21", "2013-09-21", "2013-09-30", "2013-09-30", "2013-09-30", "2013-10-16", "2013-10-16", "2013-10-16",
"2013-11-24", "2013-11-24", "2013-11-24", "2013-11-24", "2013-11-24", "2013-11-24", "2014-04-10", "2014-04-10", "2014-04-10", "2014-06-29", "2014-06-29", "2014-06-29",
"2014-07-06", "2014-07-31", "2014-07-31", "2014-09-01", "2014-09-01", "2014-09-01", "2014-09-08", "2014-09-17", "2014-09-17", "2014-09-17", "2014-10-10", "2014-10-10",
"2014-10-19", "2014-10-19", "2014-10-19", "2014-10-26", "2014-10-26", "2015-04-04", "2015-04-13", "2015-04-13", "2015-04-29", "2015-04-29", "2015-05-22", "2015-05-22",
"2015-06-07", "2015-07-09", "2015-07-09", "2015-07-18", "2015-07-18", "2015-08-03", "2015-08-03", "2015-09-20", "2015-09-20", "2015-10-06", "2015-10-13", "2015-10-13",
"2015-11-07", "2015-11-07", "2015-11-23", "2016-04-22", "2016-04-22", "2016-05-01", "2016-05-01", "2016-05-08", "2016-05-08", "2016-05-17", "2016-05-17", "2016-05-24",
"2016-05-24", "2016-06-02", "2016-06-02", "2016-06-02", "2016-06-09", "2016-06-09", "2016-07-27", "2016-07-27", "2016-08-05", "2016-08-05", "2016-08-05", "2016-08-12",
"2016-08-12", "2016-08-21", "2016-08-21", "2016-08-21", "2016-08-28", "2016-08-28", "2016-10-24", "2016-10-24", "2016-10-24", "2016-10-31", "2016-10-31", "2016-11-09",
"2016-11-09", "2016-11-09", "2017-05-04", "2017-05-04", "2017-05-04", "2017-05-11", "2017-06-05", "2017-06-05", "2017-06-05", "2017-06-12", "2017-06-21", "2017-06-21",
"2017-07-07", "2017-07-07", "2017-07-07", "2017-07-14", "2017-08-24", "2017-08-31", "2017-10-11", "2017-10-11", "2017-11-12", "2017-11-12", "2017-11-19", "2017-11-28",
"2017-11-28"
))
tmp<-data.frame(date=date, y=y)
tmp$x<-as.numeric(tmp$date)
##build smooth.spline model
sm<-smooth.spline(tmp$x,tmp$y)
##show that smooth.spline doesn't retain all of the observations
##used to build the model
length(sm$x)==length(tmp$x)
##show that predict from smooth.spline doesn't give a prediction
##for each "newdata" point
p<-predict(sm, newdata=tmp$x)
length(p)==length(tmp$x)
答案 0 :(得分:2)
smooth.spline()
仅为唯一个x变量生成值。length(sm$x)==length(unique(tmp$x)) ## TRUE
predict()
并没有接受newdata
自变量(被忽略 [!]),而是接受了x
自变量。它返回一个包含$x
和$y
元素的列表(当您计算length(p)
时得到的结果为2,这可能不是您想要的...)length(predict(sm, x=tmp$x)$x)==length(tmp$x) ## TRUE
在这种情况下,我们要做获取所有重复元素的预测值。