在R中排列列数据

时间:2018-09-03 04:33:38

标签: r

我的数据集如下所示

ID     Diagnosis date    Procedure date
1      2005-09-09        2008-04-09
1      2006-05-09        2007-08-08
2      2007-07-02        2007-08-01
2      2007-07-02        2009-08-05
2      2008-05-8         2007-08-10

我想将数据分组如下

ID      Diagnosis date     Procedure date
1       2005-09-09         2007-08-08
        2006-05-09         2008-04-09
2       2007-07-02         2007-08-01
                           2007-07-10
        2008-05-08         2009-08-05

基本上,手术日期应该在诊断日期之后

2 个答案:

答案 0 :(得分:1)

该解决方案如何。一些样本数据:

dat <- read.table(header=TRUE, stringsAsFactors=FALSE, text='
ID     Diagnosis         Procedure
1      2005-09-09        2008-04-09
1      2006-05-09        2007-08-08
2      2007-07-02        2007-08-01
2      2007-07-02        2009-08-05
2      2008-05-08        2007-08-10')

将它们转换为Date个对象。 (由于字符串格式正确,因此在不将其转换为日期的情况下也可以正常工作。我想这是我的习惯,使其成为“适当的日期对象”。)

dat$Diagnosis <- as.Date(dat$Diagnosis)
dat$Procedure <- as.Date(dat$Procedure)

min返回向量的单个最小值。 pmin返回向量之间的成对最小值:

min(c(1,1,3,4), c(2,2,4,3))
# [1] 1
pmin(c(1,1,3,4), c(2,2,4,3))
# [1] 1 1 3 3

我们可以使用它来比较两列:

tmp1 <- pmin(dat$Diagnosis, dat$Procedure)
tmp2 <- pmax(dat$Diagnosis, dat$Procedure)

并将它们存储回原位:

dat$Diagnosis <- tmp1
dat$Procedure <- tmp2

答案 1 :(得分:1)

希望以下代码解决:

library(dplyr)
data <- data.frame(ID =  c(1,1,2,2,2), Diagnosis = c("2005-09-09","2006-05-09","2007-07-02","2007-07-02","2008-05-08"),
                   Procedure =  c("2008-04-09","2007-08-08","2007-08-01","2009-08-05","2007-08-10"))

data$Diagnosis <- as.Date(data$Diagnosis)
data$Procedure <- as.Date(data$Procedure)


data1 <- data[,-2] %>%
  group_by(ID) %>%
  arrange( ID,Procedure)
out <- data.frame(data1,data[2])
out <- out[,c(1,3,2)]
out

ID  Diagnosis  Procedure
1  1 2005-09-09 2007-08-08
2  1 2006-05-09 2008-04-09
3  2 2007-07-02 2007-08-01
4  2 2007-07-02 2007-08-10
5  2 2008-05-08 2009-08-05