用dplyr按组预测线性回归

时间:2019-05-14 19:12:49

标签: r dplyr

以这个问题为基础:Add Column of Predicted Values to Data Frame with dplyr

如果我在接受的答案中运行代码:

library(dplyr)
library(purrr)
library(tidyr)

# generate the inputs like in the question
example_table <- data.frame(x = c(1:5, 1:5),
                            y = c((1:5) + rnorm(5), 2*(5:1)),
                            groups = rep(LETTERS[1:2], each = 5))

models <- example_table %>% 
  group_by(groups) %>% 
  do(model = lm(y ~ x, data = .)) %>%
  ungroup()
example_table <- left_join(tbl_df(example_table ), models, by = "groups")


# generate the extra column
example_table %>%
  group_by(groups) %>%
  do(modelr::add_predictions(., first(.$model))) %>% mutate(model = NULL)

我最终将预测存储在列表中:

   x         y groups                                             pred
1  1  1.798848      A 1.645775, 2.233358, 2.820940, 3.408523, 3.996105
2  2  2.936818      A 1.645775, 2.233358, 2.820940, 3.408523, 3.996105
3  3  1.513431      A 1.645775, 2.233358, 2.820940, 3.408523, 3.996105
4  4  3.300870      A 1.645775, 2.233358, 2.820940, 3.408523, 3.996105
5  5  4.554734      A 1.645775, 2.233358, 2.820940, 3.408523, 3.996105
6  1 10.000000      B                                   10, 8, 6, 4, 2
7  2  8.000000      B                                   10, 8, 6, 4, 2
8  3  6.000000      B                                   10, 8, 6, 4, 2
9  4  4.000000      B                                   10, 8, 6, 4, 2
10 5  2.000000      B                                   10, 8, 6, 4, 2

有什么办法让每行(y〜x)具有1个预测值?而不是整个小组的名单吗?

1 个答案:

答案 0 :(得分:0)

xts软件包存在冲突。解决了它:

example_table %>%
  group_by(groups) %>%
  do(modelr::add_predictions(., dplyr::first(.$model))) %>% mutate(model = NULL)