在管道中将dplyr中的group_by()与predict.lm和do()一起使用以进行年度线性外推

时间:2018-08-03 17:37:41

标签: r dplyr

我想在管道中应用一年的线性外推法。我想做的事情与此simple example without grouping非常相似。但是在管道内并使用names。有一些示例like this onethis onethis one。但是我无法获得理想的输出。

可复制的示例:

dplyr::group_by()

我有两个分组类别(“国家”和“实体”),我想使用1990年至1992年的值来使用线性外推法填充1993年的值。 根据{{​​3}},我可以估算线性模型:

test.frame <- data.frame(Country = 
rep(c("Austria", "Brazil", "Canada"), each = 3, times = 3), 
  Entity = rep(c("CO2","CH4","N2O"), times = 9),
  Year = rep(c(1990:1992), each = 9),
  value = runif(27, 1,5))

test.frame2 <- data.frame(Country = 
rep(c("Austria", "Brazil", "Canada"), each = 3), 
    Entity =  rep(c("CO2","CH4","N2O"), times = 3),
    Year = rep(c(1993), each = 3),
    value = 0)

results_frame <- test.frame %>% 
  dplyr::bind_rows(test.frame2)

但是,linear_model <- test.frame %>% dplyr::group_by(Country, Entity) %>% lm(value ~ Year, data=.) results <- predict.lm(linear_model, test.frame2) 没有显示出期望的输出。因此,按照提出的解决方案this,我尝试以下操作:

results

但这不起作用,相反,我得到了results_frame <- test.frame %>% dplyr::group_by(Country, Entity) %>% do(lm( value ~ Year , data = test.frame)) %>% predict.lm(linear_model, test.frame2) %>% bind_rows(test.frame)

任何帮助将不胜感激!

2 个答案:

答案 0 :(得分:2)

您可以使用嵌套的data.frames执行以下操作。此解决方案较为笼统,因为不需要在预测后重新创建"\A[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*@ (?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\z" ,并且可以有多个自变量:

test.frame2

结果:

library(tidyverse)
test.frame %>%
  group_by(Country, Entity) %>%
  nest() %>%
  inner_join(test.frame2 %>% select(-value) %>% group_by(Country, Entity) %>% nest(),
             by = c("Country", "Entity")) %>%
  mutate(model = data.x %>% map(~lm(value ~ Year, data=.)),
         value = map2(model, data.y, predict)) %>%
  select(-data.x, -model) %>%
  unnest() %>%
  bind_rows(test.frame, .)

答案 1 :(得分:0)

在拟合和预测时,必须小心使用正确的数据:

library(dplyr)
set.seed(42)
test.frame <- data.frame(Country = rep(c("Austria", "Brazil", "Canada"), each = 3, times = 3), 
                         Entity = rep(c("CO2","CH4","N2O"), times = 9),
                         Year = rep(c(1990:1992), each = 9),
                         value = runif(27, 1,5))

test.frame %>%
  group_by(Country, Entity) %>% 
  do(lm( value ~ Year , data = .) %>% 
       predict(., data.frame(Year = 1993)) %>%
       data_frame(Year = 1993, value = .)) %>%
  bind_rows(test.frame)
#> # A tibble: 36 x 4
#> # Groups:   Country, Entity [9]
#>    Country Entity  Year value
#>    <fct>   <fct>  <dbl> <dbl>
#>  1 Austria CH4     1993 2.10 
#>  2 Austria CO2     1993 2.03 
#>  3 Austria N2O     1993 6.02 
#>  4 Brazil  CH4     1993 4.90 
#>  5 Brazil  CO2     1993 0.771
#>  6 Brazil  N2O     1993 5.28 
#>  7 Canada  CH4     1993 4.69 
#>  8 Canada  CO2     1993 0.729
#>  9 Canada  N2O     1993 1.49 
#> 10 Austria CO2     1990 4.66 
#> # ... with 26 more rows