如何计算新数据帧上每个自变量的预测能力

时间:2015-08-27 14:38:35

标签: r linear-regression

我想计算每个自变量的预测能力。我有一个名为df的训练数据帧和名为df1的测试数据帧。我写了一个代码,它应该根据每个cloumn作为测试数据框的一部分附加预测结果。我的代码给出了一个奇怪的结果:它只提供了一个变量的预测结果,没有它的名字。我想看看所有变量预测及其名称也是如此。我是函数写作的新手,所以欢迎任何帮助。

df <- read.table(text = " target birds    wolfs     
                            32         9         7 
                            56         8         4 
                            11         2         8 
                            22         2         3 
                            33         8         3 
                            54         1         2 
                            34         7         16 
                            66         1         5 
                            74         17        7 
                            52         8         7 
                            45         2         7 
                            65         20        3 
                            99         6         3 
                            88         1         1 
                            77         3         11 
                            55         30         1  ",header = TRUE)

df1 <- read.table(text = " target birds    wolfs     
                            34         9         7 
                            23         8         4 
                            43         2         8 
                            45         2         3 
                            65         8         3 
                            23         1         2 
                            22         7         16 
                            99         1         5 
                            56         17        7 
                            32         8         7 
                            19         2         7 
                            91         20        3 
                            78         6         3 
                            62         1         1 
                            78         3         11 
                            69         30         1  ",header = TRUE)

以下是我使用的代码

for(i in names(df))
     { 
             if(is.numeric(df[3,i]))  ##if row 3 is numeric, the entire column is 
                 {       
                         fit_pred <- predict(lm(df[,i] ~ target, data=df), newdata=df1)

                             res <- fit_pred
                         g<-as.data.frame(cbind(df1,res))
                         g
                     }
         }

我得到的输出是:

 userid target birds wolfs   res
10    321      45     8     7  0.0515967
8     608      33     1     5  0.1696638
3     234      23     2     8  0.1696638
7     294      44     7     1  0.0515967
2     444      46     8     4  0.0515967
11    226      90     2     7  0.1696638
9     123      89     9     7  0.0515967
1     222      67     9     7  0.0515967
5     678      43     8     3  0.0515967
15    999      12     3     9  0.1696638
6     987      33     1     2  0.1696638
14    225      18     1     1  0.1696638
16    987      83     1     1  0.1696638
12    556      77     2     3  0.1696638

2 个答案:

答案 0 :(得分:2)

您不应在此处使用for循环。您应该是xxapply个家庭功能之一。这里采用R方式:

fit_pred <- function(x)predict(lm(x ~ target, data=df), newdata=df1)
do.call(cbind,lapply(df,fit_pre))
  1. 我将代码包装在函数中
  2. 我使用lapply循环遍历所有列
  3. do.callcbind toi汇总结果

答案 1 :(得分:1)

这是一个使用包dplyr和tidyr的过程,以便基于y~x组合(您指定的因变量〜您指定的自变量)创建模型,然后使用这些模型来预测新数据。

背后的想法是y和x变量都可能发生变化(即使在这里你只有“目标”为y)。我正在使用您在开头指定的数据帧df和df1(我不知道为什么“target”在您的输出中变为二进制)。

逐步运行流程以查看其工作原理并对其进行修改以更好地适应您的目标。

library(dplyr)
library(tidyr)

# input what you want as independent variables y and dependent x
ynames = c("target")
xnames = c("birds","wolfs")


###### build models

# create and reshape train y dataframes
dty = df[ynames]
dty = dty %>% gather(yvariable, yvalue)

# create and reshape train x dataframes
dtx = df[xnames]
dtx = dtx %>% gather(xvariable, xvalue)

# build model for each y~x combination
dt_model =
    dty %>% do(data.frame(.,dtx)) %>%         # create combinations of y and x variables
      group_by(yvariable, xvariable) %>%      # for each pair y and x
      do(model = lm(yvalue~xvalue, data=.))   # build the lm y~x

# you've managed to create a model for each combination and it's stored in a dataframe
dt_model

#   yvariable xvariable   model
# 1    target     birds <S3:lm>
# 2    target     wolfs <S3:lm>



####### predict

# create and reshape test y dataframes
dty = df1[ynames]
dty = dty %>% gather(yvariable, yvalue)

# create and reshape test x dataframes
dtx = df1[xnames]
dtx = dtx %>% gather(xvariable, xvalue)


dty %>% do(data.frame(.,dtx)) %>%            # create combinations of y and x variables
  group_by(yvariable, xvariable) %>%         # for each pair y and x
  do(data.frame(., pred = predict(dt_model$model[dt_model$yvariable==.$yvariable &         
                                                 dt_model$xvariable==.$xvariable][[1]]))) %>%     # get the corresponding model and predict new data
  ungroup()

#    yvariable yvalue xvariable xvalue     pred
# 1     target     34     birds      9 54.30627
# 2     target     23     birds      8 53.99573
# 3     target     43     birds      2 52.13249
# 4     target     45     birds      2 52.13249
# 5     target     65     birds      8 53.99573
# 6     target     23     birds      1 51.82195
# 7     target     22     birds      7 53.68519
# 8     target     99     birds      1 51.82195
# 9     target     56     birds     17 56.79059
# 10    target     32     birds      8 53.99573
# 11    target     19     birds      2 52.13249
# 12    target     91     birds     20 57.72220
# 13    target     78     birds      6 53.37465
# 14    target     62     birds      1 51.82195
# 15    target     78     birds      3 52.44303
# 16    target     69     birds     30 60.82760
# 17    target     34     wolfs      7 51.49364
# 18    target     23     wolfs      4 56.38136
# 19    target     43     wolfs      8 49.86441
# 20    target     45     wolfs      3 58.01059
# 21    target     65     wolfs      3 58.01059
# 22    target     23     wolfs      2 59.63983
# 23    target     22     wolfs     16 36.83051
# 24    target     99     wolfs      5 54.75212
# 25    target     56     wolfs      7 51.49364
# 26    target     32     wolfs      7 51.49364
# 27    target     19     wolfs      7 51.49364
# 28    target     91     wolfs      3 58.01059
# 29    target     78     wolfs      3 58.01059
# 30    target     62     wolfs      1 61.26907
# 31    target     78     wolfs     11 44.97669
# 32    target     69     wolfs      1 61.26907