在dplyr中结合使用minimum和mutate时如何处理重复项?

时间:2019-02-27 21:27:25

标签: r dplyr

我有以下数据框。我想为每个ID和子组添加两个新列,其最短日期为date1列,最大日期为date2列。

    ID <- c(1, 1, 1, 1, 2, 2, 2, 3, 3, 3, 3)   
    subgroup <- c("a", "a", "b", "b", "a", "a", "b", "a", "a", "b", "b")
    date1<- c("2017-12-01", "2017-10-01", "2017-10-01", "2017-10-01", "2017-01-01", "2017-02-01", "2017-01-15", "2017-12-01", "2017-10-01", "2017-10-01", "2017-10-01")                           
    date2<- c("2018-12-01", "2018-10-01", "2018-10-01", "2018-10-01", "2019-01-01", "2019-02-01", "2019-01-15", "2018-12-01", "2018-10-01", "2018-10-01", "2018-10-01")                           

    df<- data.frame(ID, subgroup, date1, date2) 

    df$date1 <- as_date(df$date1)
    df$date2 <- as_date(df$date2)


    df<- df %>% group_by(ID, subgroup) %>% mutate(min_date1 = if_else(date1 == min(date1), date1, as.Date(NA))) 
    df<- df %>% group_by(ID, subgroup) %>% mutate(max_date2 = if_else(date2 == max(date2), date2, as.Date(NA))) 

ID subgroup date1      date2      min_date1  max_date2   
 1 a        2017-12-01 2018-12-01 NA         2018-12-01
 1 a        2017-10-01 2018-10-01 2017-10-01 NA        
 1 b        2017-10-01 2018-10-01 2017-10-01 2018-10-01
 1 b        2017-10-01 2018-10-01 2017-10-01 2018-10-01
 2 a        2017-01-01 2019-01-01 2017-01-01 NA        
 2 a        2017-02-01 2019-02-01 NA         2019-02-01
 2 b        2017-01-15 2019-01-15 2017-01-15 2019-01-15
 3 a        2017-12-01 2018-12-01 NA         2018-12-01
 3 a        2017-10-01 2018-10-01 2017-10-01 NA        
 3 b        2017-10-01 2018-10-01 2017-10-01 2018-10-01
 3 b        2017-10-01 2018-10-01 2017-10-01 2018-10-01

有时,由于存在大量重复行,因此没有返回最小日期。我该如何解决?

我想将此数据帧转换为如下所示的数据帧。我想要每个ID的最小和最大日期。

ID a_min_date1 a_max_date2 b_min_date1 b_max_date2
1  2017-10-01  2018-12-01  2017-10-01  2018-10-01
2  2017-01-01  2019-02-01  2017-01-15  2019-01-15
3  2017-10-01  2018-12-01  2017-10-01  2018-10-01

谢谢。

1 个答案:

答案 0 :(得分:2)

尝试:

library(tidyverse)

df %>%
  group_by(ID, subgroup) %>%
  summarise(min_date1 = min(date1),
            max_date1 = max(date2)) %>%
  gather(key, val, min_date1:max_date1) %>%
  unite(new, subgroup, key) %>%
  spread(new, val)

输出:

# A tibble: 3 x 5
# Groups:   ID [3]
     ID a_max_date1 a_min_date1 b_max_date1 b_min_date1
  <dbl> <date>      <date>      <date>      <date>     
1     1 2018-12-01  2017-10-01  2018-10-01  2017-10-01 
2     2 2019-02-01  2017-01-01  2019-01-15  2017-01-15 
3     3 2018-12-01  2017-10-01  2018-10-01  2017-10-01 

要计数,请尝试执行以下操作:

df %>%
  group_by(ID, subgroup) %>%
  summarise(min_date1 = min(date1),
            max_date1 = max(date2)
            ) %>% ungroup() %>%
  add_count(subgroup, name = "count") %>%
  gather(key, val, min_date1:max_date1) %>%
  mutate(countv = paste0("count_", subgroup)) %>%
  unite(new, subgroup, key) %>%
  spread(new, val) %>%
  spread(countv, count) %>%
  group_by(ID) %>%
  summarise_all(list(~ first(na.omit(.))))