我有一个包含两个开始和停止日期的数据框,如下所示:
ID G1_START G1_END G2_START G2_END LOCATION
1 1/1/2021 5/31/2021 2/1/2021 5/31/2021 A
2 12/1/2020 3/31/2021 10/1/2020 5/31/2021 B
我想做的是为每位患者每月创建一行,其中四个日期之间的月份重叠。例如
ID MONTH ACTIVE LOCATION
1 2/1/2021 1 A
1 3/1/2021 1 A
1 4/1/2021 1 A
1 5/1/2021 1 A
2 12/1/2020 1 B
2 1/1/2021 1 B
2 2/1/2021 1 B
2 3/1/2021 1 B
活跃意味着 ID 在这几个月内同时处于 G1 和 G2。
答案 0 :(得分:1)
这是tidyverse
中的一个方法
pivot_longer
Date
类 (mdy
)map2
循环'START'、'END',得到'1个月'的序列floor_date
filter
分组,其中 'Categ' 不同元素为 2 个distinct
行后创建 1 个 'ACTIVE' 列library(dplyr)
library(tidyr)
library(lubridate)
library(purrr)
pivot_longer(df1, cols = contains("_"),
names_to = c("Categ", ".value"), names_sep= "_") %>%
transmute(ID, LOCATION, Categ, MONTH = map2(mdy(START), mdy(END), ~
floor_date(seq(.x, .y, by = '1 month'), 'month'))) %>%
unnest(MONTH) %>%
group_by(ID, LOCATION, MONTH) %>%
filter(n_distinct(Categ) == 2) %>%
ungroup %>%
distinct(ID, LOCATION, MONTH) %>%
mutate(ACTIVE = 1) %>%
select(ID, MONTH, ACTIVE, LOCATION)
-输出
# A tibble: 8 x 4
ID MONTH ACTIVE LOCATION
<int> <date> <dbl> <chr>
1 1 2021-02-01 1 A
2 1 2021-03-01 1 A
3 1 2021-04-01 1 A
4 1 2021-05-01 1 A
5 2 2020-12-01 1 B
6 2 2021-01-01 1 B
7 2 2021-02-01 1 B
8 2 2021-03-01 1 B
df1 <- structure(list(ID = 1:2, G1_START = c("1/1/2021", "12/1/2020"
), G1_END = c("5/31/2021", "3/31/2021"), G2_START = c("2/1/2021",
"10/1/2020"), G2_END = c("5/31/2021", "5/31/2021"), LOCATION = c("A",
"B")), class = "data.frame", row.names = c(NA, -2L))