我有一个庞大的临床数据集,我计划填充额外的列,标准几乎相似,因此可能归结为一个类似的问题。
到目前为止,我已经想到了,首先我需要根据patient_id对条目进行分组,但是我无法从这里开始。
以下是数据的快照。在R中复制并运行时,它会创建一个名为myDF
的data.framemyDF <- structure(list(patient_id = c(1L, 1L, 1L, 1L, 1L), date = structure(c(17167,
17168, 17169, 17170, 17171), class = "Date"), date_recruited = c("yes",
"", "", "", ""), ill = c("no", "no", "yes", "yes", "no")), class = "data.frame", .Names = c("id",
"date", "date_recruited", "ill"), row.names = c(NA, -5L))
我想创建一个新列(让我们称之为&#34; drop&#34;),这样,对于每个id,如果生病时的日期之间的差异==&#34;是& #34;和date_recruited = 3,填充drop。
类似的东西:
myDF2 <- structure(list(paitent_id = c(1L, 1L, 1L, 1L, 1L), date = structure(c(17167,
17168, 17169, 17170, 17171), class = "Date"), date_recruited = c("yes",
"", "", "", ""), ill = c("no", "no", "yes", "yes", "no"), drop = c("",
"", "", "drop", "")), class = "data.frame", .Names = c("paitent_id",
"date", "date_recruited", "ill", "drop"), row.names = c(NA, -5L
))
欢迎任何帮助......
答案 0 :(得分:1)
在dplyr
中,您可以执行以下操作。
myDF2 <- myDF %>% group_by(id) %>% mutate(recdate=date[which(date_recruited=="yes")[1]],
drop=ifelse(ill=="yes" & date==recdate+3,"drop",""))