根据R中的条件在特定日期之后删除行

时间:2016-02-04 11:37:19

标签: r date conditional-statements

我见过类似的问题,但没有一个问题将它应用于data.table或data.frame的特定行,而是将它应用于整个矩阵。
Subset a dataframe between 2 dates
How to select some rows with specific date from a data frame in R

我有一个数据集,其中患者被诊断为患有OA而非患者:

 dt <- data.table(ID = seq(1,10,1), OA = c(1,0,0,1,0,0,0,1,1,0), 
             oa.date = as.Date(c("01/01/2006", "01/01/2001", "01/01/2001", "02/03/2005","01/01/2001","01/01/2001","01/01/2001","05/06/2010", "01/01/2011", "01/01/2001"), "%d/%m/%Y"),
             stop.date = as.Date(c("01/01/2006", "31/12/2007", "31/12/2008", "02/03/2005", "31/12/2011", "31/12/2011", "31/12/2011", "05/06/2010", "01/01/2011", "31/12/2011"), "%d/%m/%Y"))
dt$oa.date[dt$OA==0] <- NA

> dt
    ID OA    oa.date  stop.date
 1:  1  1 2006-01-01 2006-01-01
 2:  2  0       <NA> 2007-12-31
 3:  3  0       <NA> 2008-12-31
 4:  4  1 2005-03-02 2005-03-02
 5:  5  0       <NA> 2011-12-31
 6:  6  0       <NA> 2011-12-31
 7:  7  0       <NA> 2011-12-31
 8:  8  1 2010-06-05 2010-06-05
 9:  9  1 2011-01-01 2011-01-01
10: 10  0       <NA> 2011-12-31

我想要做的是删除OA==1之前被诊断为OA(start)的人:

start <- as.Date("01/01/2009", "%d/%m/%Y")

所以我希望我的最终数据是:

> dt
     ID OA    oa.date  stop.date
 1:  2  0       <NA> 2009-12-31
 2:  3  0       <NA> 2008-12-31
 3:  5  0       <NA> 2011-12-31
 4:  6  0       <NA> 2011-12-31
 5:  7  0       <NA> 2011-12-31
 6:  8  1 2010-06-05 2010-06-05
 7:  9  1 2011-01-01 2011-01-01
 8: 10  0       <NA> 2011-12-31

我的尝试是:

  dt[dt$OA==1] <- dt[!(oa.date < start)]  

我也试过循环但没有效果。

非常感谢任何帮助。

2 个答案:

答案 0 :(得分:3)

这应该是直截了当的:

> dt[!(OA & oa.date < start)]
#   ID OA    oa.date  stop.date
#1:  2  0       <NA> 2007-12-31
#2:  3  0       <NA> 2008-12-31
#3:  5  0       <NA> 2011-12-31
#4:  6  0       <NA> 2011-12-31
#5:  7  0       <NA> 2011-12-31
#6:  8  1 2010-06-05 2010-06-05
#7:  9  1 2011-01-01 2011-01-01
#8: 10  0       <NA> 2011-12-31

OA列是二进制(1/0),在i-expression中被强制为逻辑(TRUE / FALSE)。

答案 1 :(得分:0)

你可以尝试

dt=dt[dt$OA==0|(dt$OA==1&!(dt$oa.date < start)),]