这是我的数据集:
df <- data.frame(PatientID = c("3454","3454","3454","345","345","345"), date = c("05/01/2001", "02/06/1997", "29/03/2004", "05/2/2021", "01/06/1960", "29/03/2003"),
infarct1 = c(TRUE, NA, TRUE, NA, NA, TRUE),infarct2 = c(TRUE, TRUE, TRUE, TRUE, NA, TRUE, stringsAsFactors = F)
基本上我只需要保留 1 个患者 ID(又名,消除重复的 PatientID
),基于最近的梗塞(最后 infarct==TRUE
[但任何类型的梗塞] 基于 {{1} }).
所以我想要的结果是这样的:
date
希望这是有道理的。
谢谢
答案 0 :(得分:2)
试试这个:
library(dplyr)
df <- df %>%
mutate(infarct = infarct1 | infarct2) %>%
filter(infarct == TRUE) %>%
group_by(PatientID, infarct) %>%
summarise(date=max(date))
infarct
变量。答案 1 :(得分:1)
您可以将日期转换为日期类,通过arrange
和PatientID
date
获取数据并获取infarct = TRUE
处的最后日期。
library(dplyr)
df %>%
mutate(date = lubridate::dmy(date)) %>%
arrange(PatientID, date) %>%
group_by(PatientID) %>%
summarise(date = date[max(which(infarct))],
infract = TRUE)
# PatientID date infract
# <chr> <date> <lgl>
#1 345 2003-03-29 TRUE
#2 3454 2004-03-29 TRUE
对于多列以长格式获取数据。
df %>%
mutate(date = lubridate::dmy(date)) %>%
tidyr::pivot_longer(cols = starts_with('infarct')) %>%
arrange(PatientID, date) %>%
group_by(PatientID) %>%
slice(max(which(value))) %>%
ungroup
# PatientID date name value
# <chr> <date> <chr> <lgl>
#1 345 2021-02-05 infarct2 TRUE
#2 3454 2004-03-29 infarct2 TRUE
数据
我认为您需要用引号将 date
列中的数据括起来。
df <- data.frame(PatientID = c("3454","3454","3454","345","345","345"),
date = c("05/01/2001", "02/06/1997", "29/03/2004", "05/2/2021", "01/06/1960", "29/03/2003"),
infarct = c(TRUE, NA, TRUE, NA, NA, TRUE), stringsAsFactors = FALSE)