我想计算每个自变量的预测能力。我有一个名为df
的训练数据帧和名为df1
的测试数据帧。我写了一个代码,它应该根据每个cloumn作为测试数据框的一部分附加预测结果。我的代码给出了一个奇怪的结果:它只提供了一个变量的预测结果,没有它的名字。我想看看所有变量预测及其名称也是如此。我是函数写作的新手,所以欢迎任何帮助。
df <- read.table(text = " target birds wolfs
32 9 7
56 8 4
11 2 8
22 2 3
33 8 3
54 1 2
34 7 16
66 1 5
74 17 7
52 8 7
45 2 7
65 20 3
99 6 3
88 1 1
77 3 11
55 30 1 ",header = TRUE)
df1 <- read.table(text = " target birds wolfs
34 9 7
23 8 4
43 2 8
45 2 3
65 8 3
23 1 2
22 7 16
99 1 5
56 17 7
32 8 7
19 2 7
91 20 3
78 6 3
62 1 1
78 3 11
69 30 1 ",header = TRUE)
以下是我使用的代码
for(i in names(df))
{
if(is.numeric(df[3,i])) ##if row 3 is numeric, the entire column is
{
fit_pred <- predict(lm(df[,i] ~ target, data=df), newdata=df1)
res <- fit_pred
g<-as.data.frame(cbind(df1,res))
g
}
}
我得到的输出是:
userid target birds wolfs res
10 321 45 8 7 0.0515967
8 608 33 1 5 0.1696638
3 234 23 2 8 0.1696638
7 294 44 7 1 0.0515967
2 444 46 8 4 0.0515967
11 226 90 2 7 0.1696638
9 123 89 9 7 0.0515967
1 222 67 9 7 0.0515967
5 678 43 8 3 0.0515967
15 999 12 3 9 0.1696638
6 987 33 1 2 0.1696638
14 225 18 1 1 0.1696638
16 987 83 1 1 0.1696638
12 556 77 2 3 0.1696638
答案 0 :(得分:2)
您不应在此处使用for
循环。您应该是xxapply
个家庭功能之一。这里采用R方式:
fit_pred <- function(x)predict(lm(x ~ target, data=df), newdata=df1)
do.call(cbind,lapply(df,fit_pre))
lapply
循环遍历所有列do.call
和cbind
toi汇总结果答案 1 :(得分:1)
这是一个使用包dplyr和tidyr的过程,以便基于y~x组合(您指定的因变量〜您指定的自变量)创建模型,然后使用这些模型来预测新数据。
背后的想法是y和x变量都可能发生变化(即使在这里你只有“目标”为y)。我正在使用您在开头指定的数据帧df和df1(我不知道为什么“target”在您的输出中变为二进制)。
逐步运行流程以查看其工作原理并对其进行修改以更好地适应您的目标。
library(dplyr)
library(tidyr)
# input what you want as independent variables y and dependent x
ynames = c("target")
xnames = c("birds","wolfs")
###### build models
# create and reshape train y dataframes
dty = df[ynames]
dty = dty %>% gather(yvariable, yvalue)
# create and reshape train x dataframes
dtx = df[xnames]
dtx = dtx %>% gather(xvariable, xvalue)
# build model for each y~x combination
dt_model =
dty %>% do(data.frame(.,dtx)) %>% # create combinations of y and x variables
group_by(yvariable, xvariable) %>% # for each pair y and x
do(model = lm(yvalue~xvalue, data=.)) # build the lm y~x
# you've managed to create a model for each combination and it's stored in a dataframe
dt_model
# yvariable xvariable model
# 1 target birds <S3:lm>
# 2 target wolfs <S3:lm>
####### predict
# create and reshape test y dataframes
dty = df1[ynames]
dty = dty %>% gather(yvariable, yvalue)
# create and reshape test x dataframes
dtx = df1[xnames]
dtx = dtx %>% gather(xvariable, xvalue)
dty %>% do(data.frame(.,dtx)) %>% # create combinations of y and x variables
group_by(yvariable, xvariable) %>% # for each pair y and x
do(data.frame(., pred = predict(dt_model$model[dt_model$yvariable==.$yvariable &
dt_model$xvariable==.$xvariable][[1]]))) %>% # get the corresponding model and predict new data
ungroup()
# yvariable yvalue xvariable xvalue pred
# 1 target 34 birds 9 54.30627
# 2 target 23 birds 8 53.99573
# 3 target 43 birds 2 52.13249
# 4 target 45 birds 2 52.13249
# 5 target 65 birds 8 53.99573
# 6 target 23 birds 1 51.82195
# 7 target 22 birds 7 53.68519
# 8 target 99 birds 1 51.82195
# 9 target 56 birds 17 56.79059
# 10 target 32 birds 8 53.99573
# 11 target 19 birds 2 52.13249
# 12 target 91 birds 20 57.72220
# 13 target 78 birds 6 53.37465
# 14 target 62 birds 1 51.82195
# 15 target 78 birds 3 52.44303
# 16 target 69 birds 30 60.82760
# 17 target 34 wolfs 7 51.49364
# 18 target 23 wolfs 4 56.38136
# 19 target 43 wolfs 8 49.86441
# 20 target 45 wolfs 3 58.01059
# 21 target 65 wolfs 3 58.01059
# 22 target 23 wolfs 2 59.63983
# 23 target 22 wolfs 16 36.83051
# 24 target 99 wolfs 5 54.75212
# 25 target 56 wolfs 7 51.49364
# 26 target 32 wolfs 7 51.49364
# 27 target 19 wolfs 7 51.49364
# 28 target 91 wolfs 3 58.01059
# 29 target 78 wolfs 3 58.01059
# 30 target 62 wolfs 1 61.26907
# 31 target 78 wolfs 11 44.97669
# 32 target 69 wolfs 1 61.26907