如何使用tidyr :: complete为每个组合添加缺少的日期(结果列除外)

时间:2019-07-10 14:25:24

标签: r tidyverse tidyr

如何对几列分组附加日期。 请在下面找到预期的输出和当前的1

尽管这些是类似的问题, 在这种情况下,考虑这些列的分组是不同的:

github issue

Using tidyr::complete with group_by

library(tidyverse)
sample_data <- tribble(~A, ~B, ~C, ~ Date, ~ Result,
        "AL",123,"12", as.Date("2014-02-01"), 12345,
        "AL",123,"12", as.Date("2014-04-01"), 12349,
        "AL",123,"12", as.Date("2014-06-01"), 12977,
        "AZ",123,"12", as.Date("2014-01-01"),23435,
        "AZ",123,"12", as.Date("2014-04-01"),453454,
        "AZ",123,"12", as.Date("2014-07-01"),123976)

sample_data %<>% complete(Date = seq.Date(min(Date), max(Date), by="month")) 
# Output
> sample_data
# A tibble: 8 x 5
  Date       A         B C     Result
  <date>     <chr> <dbl> <chr>  <dbl>
1 2014-01-01 AZ      123 12     23435
2 2014-02-01 AL      123 12     12345
3 2014-03-01 NA       NA NA        NA
4 2014-04-01 AL      123 12     12349
5 2014-04-01 AZ      123 12    453454
6 2014-05-01 NA       NA NA        NA
7 2014-06-01 AL      123 12     12977
8 2014-07-01 AZ      123 12    123976


# Tried but 
sample_data %>% 
  group_by(A,B,C) %>% 
  mutate(tidyr::complete(Date = seq.Date(min(Date), max(Date), by="month")))

# Expected output
expected_output <-tribble(~A, ~B, ~C, ~ Date, ~ Result,
                            "AL",123,"12", as.Date("2014-01-01"), NA,
                            "AL",123,"12", as.Date("2014-02-01"), 12345,
                            "AL",123,"12", as.Date("2014-03-01"), NA,
                            "AL",123,"12", as.Date("2014-04-01"), 12349,
                            "AL",123,"12", as.Date("2014-05-01"), NA,
                            "AL",123,"12", as.Date("2014-06-01"), 12977,
                            "AL",123,"12", as.Date("2014-07-01"), NA,
                            "AZ",123,"12", as.Date("2014-01-01"),23435,
                            "AZ",123,"12", as.Date("2014-02-01"),NA,
                            "AZ",123,"12", as.Date("2014-03-01"),NA,
                            "AZ",123,"12", as.Date("2014-04-01"),453454,
                            "AZ",123,"12", as.Date("2014-05-01"),NA,
                            "AZ",123,"12", as.Date("2014-06-01"),NA,
                            "AZ",123,"12", as.Date("2014-07-01"),123976)

1 个答案:

答案 0 :(得分:1)

一种选择是使用group_by并使用整个“日期”列中的minmax,而不是minmax每个组

library(dplyr)
library(tidyr)
sample_data %>% 
   group_by(A, B, C) %>% 
   complete(Date = seq.Date(min(.$Date), max(.$Date), by="month"))
# A tibble: 14 x 5
# Groups:   A, B, C [2]
#   A         B C     Date       Result
#   <chr> <dbl> <chr> <date>      <dbl>
# 1 AL      123 12    2014-01-01     NA
# 2 AL      123 12    2014-02-01  12345
# 3 AL      123 12    2014-03-01     NA
# 4 AL      123 12    2014-04-01  12349
# 5 AL      123 12    2014-05-01     NA
# 6 AL      123 12    2014-06-01  12977
# 7 AL      123 12    2014-07-01     NA
# 8 AZ      123 12    2014-01-01  23435
# 9 AZ      123 12    2014-02-01     NA
#10 AZ      123 12    2014-03-01     NA
#11 AZ      123 12    2014-04-01 453454
#12 AZ      123 12    2014-05-01     NA
#13 AZ      123 12    2014-06-01     NA
#14 AZ      123 12    2014-07-01 123976