smooth.spline不能预测newdata =提供的所有数据点

时间:2019-01-07 16:56:49

标签: r

我在R中使用smooth.spline()函数时遇到了麻烦。

预期的行为:sm <- smooth.spline(data$x, data$y)应该创建一个对象,其中用于构建它的所有数据都作为模型的一部分返回;例如sm$xsm$y的长度应等于data$xdata$y。从新数据集进行的进一步预测应以相同的方式进行操作,以使从预测函数返回的向量的长度与用于预测的数据的长度相同。

观察到的行为:smooth.spline()不会保留用于构建数据的所有数据,也不会为每个观察结果创建预测。

我敢肯定,我在这里忽略了一些愚蠢的东西,但是我无法弄清楚。

可复制的示例:

##create data set for smooth.spline model
y<-c(24.5292685, 24.5292685, 24.5292685, 24.5292685, 20.6326320, 20.6326320, 20.6326320, 22.3128140, 22.3128140, 22.3128140, 22.3128140, 22.3128140, 20.4381464, 20.4381464,
 20.4381464, 20.7528465, 20.7528465, 20.7528465, 22.8431767, 22.8431767, 22.8431767, 22.8431767, 22.8431767, 21.5483605, 21.5483605, 16.3262066, 16.3262066, 16.3262066,
 16.3262066, 16.3262066, 18.4767226, 18.4767226, 18.4767226,  7.6244328,  7.6244328,  7.6244328, -3.9726025, -3.9726025, -3.9726025, -3.9726025, -3.9726025, -3.9726025,
 1.2090523,  1.2090523,  1.2090523, 19.5545305, 19.5545305, 19.5545305, 20.2714809, 20.5635124, 20.5635124, 18.0657058, 18.0657058, 18.0657058, 17.5258674, 13.6653809,
 13.6653809, 13.6653809,  8.8839465,  8.8839465,  8.2448148,  8.2448148,  8.2448148,  8.7948831,  8.7948831,  2.5137371,  8.6105971,  8.6105971, 14.7650620, 14.7650620,
 14.8774259, 14.8774259, 16.9564789, 20.2428563, 20.2428563, 20.8039368, 20.8039368, 21.4189956, 21.4189956, 16.5872965, 16.5872965, 13.1912207, 12.6576378, 12.6576378,
 5.2589847,  5.2589847, -0.1451702, 11.0045202, 11.0045202, 10.8005181, 10.8005181, 15.6980449, 15.6980449, 14.9910402, 14.9910402, 18.7319224, 18.7319224, 18.3692496,
 18.3692496, 18.3692496,  9.2911955,  9.2911955, 20.3091283, 20.3091283, 21.9794811, 21.9794811, 21.9794811, 21.0654857, 21.0654857, 19.7033713, 19.7033713, 19.7033713,
 18.9868848, 18.9868848,  3.9505643,  3.9505643,  3.9505643,  7.1717265,  7.1717265,  6.4622461,  6.4622461,  6.4622461,  9.5197519,  9.5197519,  9.5197519, 15.1281635,
 20.7016476, 20.7016476, 20.7016476, 20.1041949, 19.5109579, 19.5109579, 22.5606698, 22.5606698, 22.5606698, 22.1459763, 17.9437652, 19.0612696, 
 10.5874702, 10.5874702,  -0.5577768, -0.5577768, -2.4302378, -0.7752072, -0.7752072)

date<-as.Date(c(
 "2013-07-03", "2013-07-03", "2013-07-03", "2013-07-03", "2013-07-12", "2013-07-12", "2013-07-12", "2013-07-19", "2013-07-19", "2013-07-19", "2013-07-19", "2013-07-19",
 "2013-07-28", "2013-07-28", "2013-07-28", "2013-08-13", "2013-08-13", "2013-08-13", "2013-08-20", "2013-08-20", "2013-08-20", "2013-08-20", "2013-08-20", "2013-09-05",
 "2013-09-05", "2013-09-21", "2013-09-21", "2013-09-21", "2013-09-21", "2013-09-21", "2013-09-30", "2013-09-30", "2013-09-30", "2013-10-16", "2013-10-16", "2013-10-16",
 "2013-11-24", "2013-11-24", "2013-11-24", "2013-11-24", "2013-11-24", "2013-11-24", "2014-04-10", "2014-04-10", "2014-04-10", "2014-06-29", "2014-06-29", "2014-06-29",
 "2014-07-06", "2014-07-31", "2014-07-31", "2014-09-01", "2014-09-01", "2014-09-01", "2014-09-08", "2014-09-17", "2014-09-17", "2014-09-17", "2014-10-10", "2014-10-10",
 "2014-10-19", "2014-10-19", "2014-10-19", "2014-10-26", "2014-10-26", "2015-04-04", "2015-04-13", "2015-04-13", "2015-04-29", "2015-04-29", "2015-05-22", "2015-05-22",
 "2015-06-07", "2015-07-09", "2015-07-09", "2015-07-18", "2015-07-18", "2015-08-03", "2015-08-03", "2015-09-20", "2015-09-20", "2015-10-06", "2015-10-13", "2015-10-13",
 "2015-11-07", "2015-11-07", "2015-11-23", "2016-04-22", "2016-04-22", "2016-05-01", "2016-05-01", "2016-05-08", "2016-05-08", "2016-05-17", "2016-05-17", "2016-05-24",
 "2016-05-24", "2016-06-02", "2016-06-02", "2016-06-02", "2016-06-09", "2016-06-09", "2016-07-27", "2016-07-27", "2016-08-05", "2016-08-05", "2016-08-05", "2016-08-12",
 "2016-08-12", "2016-08-21", "2016-08-21", "2016-08-21", "2016-08-28", "2016-08-28", "2016-10-24", "2016-10-24", "2016-10-24", "2016-10-31", "2016-10-31", "2016-11-09",
 "2016-11-09", "2016-11-09", "2017-05-04", "2017-05-04", "2017-05-04", "2017-05-11", "2017-06-05", "2017-06-05", "2017-06-05", "2017-06-12", "2017-06-21", "2017-06-21",
 "2017-07-07", "2017-07-07", "2017-07-07", "2017-07-14", "2017-08-24", "2017-08-31", "2017-10-11", "2017-10-11", "2017-11-12", "2017-11-12", "2017-11-19", "2017-11-28",
 "2017-11-28"
))

tmp<-data.frame(date=date, y=y)
tmp$x<-as.numeric(tmp$date)

##build smooth.spline model
sm<-smooth.spline(tmp$x,tmp$y)

##show that smooth.spline doesn't retain all of the observations 
##used to build the model
length(sm$x)==length(tmp$x)

##show that predict from smooth.spline doesn't give a prediction 
##for each "newdata" point

p<-predict(sm, newdata=tmp$x)
length(p)==length(tmp$x)

1 个答案:

答案 0 :(得分:2)

  • smooth.spline()仅为唯一个x变量生成
length(sm$x)==length(unique(tmp$x))  ## TRUE
  • 令人困惑的是,predict()并没有接受newdata自变量(被忽略 [!]),而是接受了x自变量。它返回一个包含$x$y元素的列表(当您计算length(p)时得到的结果为2,这可能不是您想要的...)
length(predict(sm, x=tmp$x)$x)==length(tmp$x) ## TRUE

在这种情况下,我们要做获取所有重复元素的预测值。