我的数据集如下所示
ID Diagnosis date Procedure date
1 2005-09-09 2008-04-09
1 2006-05-09 2007-08-08
2 2007-07-02 2007-08-01
2 2007-07-02 2009-08-05
2 2008-05-8 2007-08-10
我想将数据分组如下
ID Diagnosis date Procedure date
1 2005-09-09 2007-08-08
2006-05-09 2008-04-09
2 2007-07-02 2007-08-01
2007-07-10
2008-05-08 2009-08-05
基本上,手术日期应该在诊断日期之后
答案 0 :(得分:1)
该解决方案如何。一些样本数据:
dat <- read.table(header=TRUE, stringsAsFactors=FALSE, text='
ID Diagnosis Procedure
1 2005-09-09 2008-04-09
1 2006-05-09 2007-08-08
2 2007-07-02 2007-08-01
2 2007-07-02 2009-08-05
2 2008-05-08 2007-08-10')
将它们转换为Date
个对象。 (由于字符串格式正确,因此在不将其转换为日期的情况下也可以正常工作。我想这是我的习惯,使其成为“适当的日期对象”。)
dat$Diagnosis <- as.Date(dat$Diagnosis)
dat$Procedure <- as.Date(dat$Procedure)
min
返回向量的单个最小值。 pmin
返回向量之间的成对最小值:
min(c(1,1,3,4), c(2,2,4,3))
# [1] 1
pmin(c(1,1,3,4), c(2,2,4,3))
# [1] 1 1 3 3
我们可以使用它来比较两列:
tmp1 <- pmin(dat$Diagnosis, dat$Procedure)
tmp2 <- pmax(dat$Diagnosis, dat$Procedure)
并将它们存储回原位:
dat$Diagnosis <- tmp1
dat$Procedure <- tmp2
答案 1 :(得分:1)
希望以下代码解决:
library(dplyr)
data <- data.frame(ID = c(1,1,2,2,2), Diagnosis = c("2005-09-09","2006-05-09","2007-07-02","2007-07-02","2008-05-08"),
Procedure = c("2008-04-09","2007-08-08","2007-08-01","2009-08-05","2007-08-10"))
data$Diagnosis <- as.Date(data$Diagnosis)
data$Procedure <- as.Date(data$Procedure)
data1 <- data[,-2] %>%
group_by(ID) %>%
arrange( ID,Procedure)
out <- data.frame(data1,data[2])
out <- out[,c(1,3,2)]
out
ID Diagnosis Procedure
1 1 2005-09-09 2007-08-08
2 1 2006-05-09 2008-04-09
3 2 2007-07-02 2007-08-01
4 2 2007-07-02 2007-08-10
5 2 2008-05-08 2009-08-05