按组运行回归并粘贴到原始数据框中以获取预测值 - 提供的示例代码

时间:2018-06-07 14:28:12

标签: r rstudio regression predict

我有一个用于生成自动回归的代码。但我正在努力如何实现预测功能,将预测粘贴到每个日期的原始数据集中。

谢谢,

我到目前为止的代码:

test = df[(df$key==1 | df$key==2),]

df_list=split(test, test$key)
reg_results = lapply(df_list,function(temp) {

  good_cols=sapply(temp,function(x){
    is.numeric(x) && ((max(x)-min(x))>10000)
  })

  temp=temp[,good_cols]
  fit=step(lm(y~.,data=temp))
  return(fit)
})

df_list_summary = lapply(reg_results, function(model_output){
  broom::tidy(model_output)
})
final_step2 = dplyr::bind_rows(df_list_summary, .id="Key's")
readr::write_csv(final_step2,"test2.csv")

示例df:

   Key  Date                     y   x1   x2   x3
   1    1/10/2018 12:00:00 AM    2   3    2    5
   1    1/11/2018 12:00:00 AM    3   5    7    2
   1    1/12/2018 12:00:00 AM    5   7    4    7 
   1    1/13/2018 12:00:00 AM    7   2    7    6
   2    1/10/2018 12:00:00 AM    2   6    3    8
   2    1/11/2018 12:00:00 AM    3   7    7    3
   2    1/12/2018 12:00:00 AM    3   2    3    4
   2    1/13/2018 12:00:00 AM    7   6    2    7

期望的结果:

   Key  Date                     y   x1   x2   x3  predicted values for each date
   1    1/10/2018 12:00:00 AM    2   3    2    5   ...
   1    1/11/2018 12:00:00 AM    3   5    7    2   ...
   1    1/12/2018 12:00:00 AM    5   7    4    7   ...
   1    1/13/2018 12:00:00 AM    7   2    7    6   ...
   2    1/10/2018 12:00:00 AM    2   6    3    8   ...
   2    1/11/2018 12:00:00 AM    3   7    7    3   ...
   2    1/12/2018 12:00:00 AM    3   2    3    4   ...
   2    1/13/2018 12:00:00 AM    7   6    2    7   ...

到目前为止我尝试的无济于事:

test2 = df[(df$key==1 | df$key==2),]

unsplit(lapply(split(test, test$key),function(w){
reg_results = lapply(df_list,function(temp) {

  good_cols=sapply(temp,function(x){
    is.numeric(x) && ((max(x)-min(x))>10000)
  })

  temp=temp[,good_cols]
  fit=lm(y~.,data=temp)
})
  cbind(w,predict(fit,subset(df, key=="1" | key=="2")))
}),test$key)

df_list_summary = lapply(reg_results, function(model_output){
  broom::tidy(model_output)
})
final_step2 = dplyr::bind_rows(df_list_summary, .id="key's")
readr::write_csv(final_step2,"test2.csv")

更新: 所以MrFlick的代码已经有效了;但是,我试图弄清楚如何将代码应用于out_of_sample_df。有人可以帮忙吗?

 test = df[(df$key==1 | df$key==2),]

df_list=split(test, test$key)
reg_results = lapply(df_list,function(temp) {

  good_cols=sapply(temp,function(x){
    is.numeric(x) && ((max(x)-min(x))>10000)
  })

  temp=temp[,good_cols]
  fit=step(lm(y~.,data=temp))
  return(fit)
})


#MrFlicks contribution - need help to adjust this line of code to apply to out of sample data to produce prediction results. Currently this line of code inserts pred column inside original data set.

    reg_predict = dplyr::bind_rows(Map(function(data, model) {
           data.frame(data, pred=predict(model))    }, df_list, reg_results))


df_list_summary = lapply(reg_results, function(model_output){
  broom::tidy(model_output)
})
final_step2 = dplyr::bind_rows(df_list_summary, .id="Key's")
readr::write_csv(final_step2,"test2.csv")

谢谢,

1 个答案:

答案 0 :(得分:0)

您可以使用Map()迭代数据和模型,以获得您所追求的结果。从原始代码开始,您可以像这样做

reg_predict = dplyr::bind_rows(Map(function(data, model) {
    data.frame(data, pred=predict(model))
}, df_list, reg_results))