Question

我正在建立一个类似的问题，并在一年前回答了问题。它与这篇文章有关：how to merge two linear regression prediction models (each per data frame's subset) into one column of the data frame

我将使用与此处使用的数据相同的数据，但使用新列。我创建了数据：

dat = read.table(text = " cats birds    wolfs     snakes     trees
0        3        8         7        2
1        3        8         7        3
1        1        2         3        2
0        1        2         3        1
0        1        2         3        2
1        6        1         1        3
0        6        1         1        1
1        6        1         1        1   " ,header = TRUE)

模拟狼的数量，使用两个数据子集来区分条件。每个子集的方程是不同的。

f0 = lm(wolfs~snakes,data = dat,subset=dat$cats==0)
f1 = lm(wolfs~snakes + trees,data = dat,subset=dat$cats==1)

预测每个子集的狼数。

f0_predict = predict(f0,data = dat,subset=dat$cats==1,type='response')
f1_predict = predict(f1,data = dat,subset=dat$cats==0,type='response')

然后（再次，根据2015年的帖子）我用cat变量分割数据。

dat.l = split(dat, dat$cats)
dat.l

......这里有点棘手。 2015年的帖子建议使用lapply将两组预测附加到数据集中。但是，在这里，受访者的功能不起作用，因为它假设两个回归方程基本相同。这是我的尝试（它接近原作，只是调整过）：

dat.l = lapply(dat.l, function(x){
mod = 

ifelse(dat$cats==0,lm(wolfs~snakes,data=x),lm(wolfs~snakes+trees,data=x)) 
               x$full_prediction = predict(mod,data=x,type='response')
               return(x)
    })
    unsplit(dat.l, dat$cats)

关于最后几个步骤的任何想法？我仍然是相对较新的S.O.，并且是R的中间人，所以如果我没有按照社区的喜好精确发布，请轻轻一点。

Answer 1

这是一个dplyr解决方案，基于您引用的上一篇文章：

library(dplyr)

# create a new column defining the lm formula for each level of cats
dat <- dat %>% mutate(formula = ifelse(cats==0, "wolfs ~ snakes", 
        "wolfs ~ snakes + trees"))

# build model and find predicted values for each value of cats
dat <- dat %>% group_by(cats) %>%
    do({
        mod <- lm(as.formula(.$formula[1]), data = .)
        pred <- predict(mod)
        data.frame(., pred)
    })

> dat
Source: local data frame [8 x 7]
Groups: cats [2]
   cats birds wolfs snakes trees                formula      pred
  (int) (int) (int)  (int) (int)                  (chr)     (dbl)
1     0     3     8      7     2         wolfs ~ snakes 7.5789474
2     0     1     2      3     1         wolfs ~ snakes 2.6315789
3     0     1     2      3     2         wolfs ~ snakes 2.6315789
4     0     6     1      1     1         wolfs ~ snakes 0.1578947
5     1     3     8      7     3 wolfs ~ snakes + trees 7.6800000
6     1     1     2      3     2 wolfs ~ snakes + trees 2.9600000
7     1     6     1      1     3 wolfs ~ snakes + trees 0.8400000
8     1     6     1      1     1 wolfs ~ snakes + trees 0.5200000

将两个回归预测模型（具有数据框的子集）合并回数据框（一列）

1 个答案: