我有以下数据框。我想为每个ID和子组添加两个新列,其最短日期为date1列,最大日期为date2列。
ID <- c(1, 1, 1, 1, 2, 2, 2, 3, 3, 3, 3)
subgroup <- c("a", "a", "b", "b", "a", "a", "b", "a", "a", "b", "b")
date1<- c("2017-12-01", "2017-10-01", "2017-10-01", "2017-10-01", "2017-01-01", "2017-02-01", "2017-01-15", "2017-12-01", "2017-10-01", "2017-10-01", "2017-10-01")
date2<- c("2018-12-01", "2018-10-01", "2018-10-01", "2018-10-01", "2019-01-01", "2019-02-01", "2019-01-15", "2018-12-01", "2018-10-01", "2018-10-01", "2018-10-01")
df<- data.frame(ID, subgroup, date1, date2)
df$date1 <- as_date(df$date1)
df$date2 <- as_date(df$date2)
df<- df %>% group_by(ID, subgroup) %>% mutate(min_date1 = if_else(date1 == min(date1), date1, as.Date(NA)))
df<- df %>% group_by(ID, subgroup) %>% mutate(max_date2 = if_else(date2 == max(date2), date2, as.Date(NA)))
ID subgroup date1 date2 min_date1 max_date2
1 a 2017-12-01 2018-12-01 NA 2018-12-01
1 a 2017-10-01 2018-10-01 2017-10-01 NA
1 b 2017-10-01 2018-10-01 2017-10-01 2018-10-01
1 b 2017-10-01 2018-10-01 2017-10-01 2018-10-01
2 a 2017-01-01 2019-01-01 2017-01-01 NA
2 a 2017-02-01 2019-02-01 NA 2019-02-01
2 b 2017-01-15 2019-01-15 2017-01-15 2019-01-15
3 a 2017-12-01 2018-12-01 NA 2018-12-01
3 a 2017-10-01 2018-10-01 2017-10-01 NA
3 b 2017-10-01 2018-10-01 2017-10-01 2018-10-01
3 b 2017-10-01 2018-10-01 2017-10-01 2018-10-01
有时,由于存在大量重复行,因此没有返回最小日期。我该如何解决?
我想将此数据帧转换为如下所示的数据帧。我想要每个ID的最小和最大日期。
ID a_min_date1 a_max_date2 b_min_date1 b_max_date2
1 2017-10-01 2018-12-01 2017-10-01 2018-10-01
2 2017-01-01 2019-02-01 2017-01-15 2019-01-15
3 2017-10-01 2018-12-01 2017-10-01 2018-10-01
谢谢。
答案 0 :(得分:2)
尝试:
library(tidyverse)
df %>%
group_by(ID, subgroup) %>%
summarise(min_date1 = min(date1),
max_date1 = max(date2)) %>%
gather(key, val, min_date1:max_date1) %>%
unite(new, subgroup, key) %>%
spread(new, val)
输出:
# A tibble: 3 x 5
# Groups: ID [3]
ID a_max_date1 a_min_date1 b_max_date1 b_min_date1
<dbl> <date> <date> <date> <date>
1 1 2018-12-01 2017-10-01 2018-10-01 2017-10-01
2 2 2019-02-01 2017-01-01 2019-01-15 2017-01-15
3 3 2018-12-01 2017-10-01 2018-10-01 2017-10-01
要计数,请尝试执行以下操作:
df %>%
group_by(ID, subgroup) %>%
summarise(min_date1 = min(date1),
max_date1 = max(date2)
) %>% ungroup() %>%
add_count(subgroup, name = "count") %>%
gather(key, val, min_date1:max_date1) %>%
mutate(countv = paste0("count_", subgroup)) %>%
unite(new, subgroup, key) %>%
spread(new, val) %>%
spread(countv, count) %>%
group_by(ID) %>%
summarise_all(list(~ first(na.omit(.))))