我有一组具有记录日期和首次就诊日期和疾病状态的患者ID,我想删除所有患者ID (如果它们具有相同的记录日期和首次访问日期)。我的数据集看起来像
p_id Record_date fvdate Disease
12 02-03-2017 02-03-2017 1
12 05-03-2017 02-03-2017 0
12 03-04-2018 02-03-2017 1
11 04-05-2016 05-06-2017 0
13 18-06-2017 18-06-2017 1
13 03-05-2018 18-06-2017 0
13 09-09-2019 18-06-2017 0
14 09-12-2017 03-01-2018 1
我需要的输出
p_id Record_date fvdate Disease
11 04-05-2016 05-06-2017 0
14 09-12-2017 03-01-2018 1
提前谢谢
答案 0 :(得分:3)
我们可以为first
选择Record_date
fvdate
与p_id
不同的组。
library(dplyr)
df %>% group_by(p_id) %>% filter(first(Record_date) != first(fvdate))
# p_id Record_date fvdate Disease
# <int> <fct> <fct> <int>
#1 11 04-05-2016 05-06-2017 0
#2 14 09-12-2017 03-01-2018 1
或:
df %>% group_by(p_id) %>% filter(!any(Record_date == first(fvdate)))
数据
df <- structure(list(p_id = c(12L, 12L, 12L, 11L, 13L, 13L, 13L, 14L
), Record_date = c("02-03-2017", "05-03-2017", "03-04-2018",
"04-05-2016", "18-06-2017", "03-05-2018", "09-09-2019", "09-12-2017"
), fvdate = c("02-03-2017", "02-03-2017", "02-03-2017", "05-06-2017",
"18-06-2017", "18-06-2017", "18-06-2017", "03-01-2018"), Disease = c(1L,
0L, 1L, 0L, 1L, 0L, 0L, 1L)), row.names = c(NA, -8L), class = "data.frame")