我有一个数据框df
,其中包含变量gfr
和相应的gfr_date
。我还有第二个数据框drug
,它指示何时开始和停止药物。如ID 1所示,该药物可以多次启动和停止。
我想“合并”两个数据帧(wanted
)以得到一个新变量(drug1
),如果id在gfr_date
处使用某种药物,则该变量为1,如果该变量使用了某种药物,则为0不是。
一些示例数据df
和drug
:
library(dplyr)
#>
#> Attache Paket: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
df <- structure(list(id = c(1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2,
2, 2, 3, 3, 3, 3),
gfr = c(90, 109, 84, 81, 92, 76, 40, 64, 41,
64, 28, 82, 61, 53, 47, 31, 29, 31, 31),
gfr_date = structure(c(16812, 16996, 17178,
17372, 17542, 17731,
16686, 16741, 16898,
16909, 17098, 17120,
17295, 17442, 17668,
17283, 17435, 17568,
17758), class = "Date")),
row.names = c(NA, -19L), class = c("tbl_df", "tbl", "data.frame"))
df
#> # A tibble: 19 x 3
#> id gfr gfr_date
#> <dbl> <dbl> <date>
#> 1 1 90 2016-01-12
#> 2 1 109 2016-07-14
#> 3 1 84 2017-01-12
#> 4 1 81 2017-07-25
#> 5 1 92 2018-01-11
#> 6 1 76 2018-07-19
#> 7 2 40 2015-09-08
#> 8 2 64 2015-11-02
#> 9 2 41 2016-04-07
#> 10 2 64 2016-04-18
#> 11 2 28 2016-10-24
#> 12 2 82 2016-11-15
#> 13 2 61 2017-05-09
#> 14 2 53 2017-10-03
#> 15 2 47 2018-05-17
#> 16 3 31 2017-04-27
#> 17 3 29 2017-09-26
#> 18 3 31 2018-02-06
#> 19 3 31 2018-08-15
drug <- structure(list(id = c(1, 1, 2, 3),
drug = c("drug1", "drug1", "drug1", "drug1"),
drstartd = structure(c(16261, 17008, 17443, 16252), class = "Date"),
drstopd = structure(c(16417, NA, NA, 17412), class = "Date")),
class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, -4L))
drug
#> # A tibble: 4 x 4
#> id drug drstartd drstopd
#> <dbl> <chr> <date> <date>
#> 1 1 drug1 2014-07-10 2014-12-13
#> 2 1 drug1 2016-07-26 NA
#> 3 2 drug1 2017-10-04 NA
#> 4 3 drug1 2014-07-01 2017-09-03
wanted <- df %>%
mutate(drug1 = c(0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0))
wanted
#> # A tibble: 19 x 4
#> id gfr gfr_date drug1
#> <dbl> <dbl> <date> <dbl>
#> 1 1 90 2016-01-12 0
#> 2 1 109 2016-07-14 0
#> 3 1 84 2017-01-12 1
#> 4 1 81 2017-07-25 1
#> 5 1 92 2018-01-11 1
#> 6 1 76 2018-07-19 1
#> 7 2 40 2015-09-08 0
#> 8 2 64 2015-11-02 0
#> 9 2 41 2016-04-07 0
#> 10 2 64 2016-04-18 0
#> 11 2 28 2016-10-24 0
#> 12 2 82 2016-11-15 0
#> 13 2 61 2017-05-09 0
#> 14 2 53 2017-10-03 0
#> 15 2 47 2018-05-17 1
#> 16 3 31 2017-04-27 1
#> 17 3 29 2017-09-26 0
#> 18 3 31 2018-02-06 0
#> 19 3 31 2018-08-15 0
由reprex package(v0.3.0)于2019-08-19创建
我希望所需的数据帧正确无误,因为我手工完成了...谢谢您的帮助!
编辑 从reprex中删除了STATA属性。
答案 0 :(得分:1)
我相信这应该可以满足您的要求。请注意,我将'drug'的id列设置为常规数字(我摆脱了'label =“ ID”,format.stata =“%12.0g”'),以便可以对'id'进行连接>
df %>%
left_join(drug, by = 'id') %>%
mutate(drug1 = gfr_date >= drstartd & (gfr_date <= drstopd | is.na(drstopd))) %>%
group_by(id, gfr, gfr_date) %>%
summarize(drug1 = as.numeric(any(drug1))) %>%
arrange(id, gfr_date)