通过data.table中的时间间隔上的逻辑子设置来定义变量

时间:2015-03-19 17:01:36

标签: r data.table

我的data.table看起来像这样:

    id event state      time
 1:  A     0  NULL 0.8998250
 2:  A     1  NULL 1.1459127
 3:  A     0  NULL 1.1879722
 4:  A     2  NULL 1.5158930
 5:  A     0  NULL 2.4703966
 6:  B     0  NULL 0.8895393
 7:  B     1  NULL 1.5823427
 8:  B     2  NULL 2.2228495
 9:  B     0  NULL 3.2171193
10:  B     0  NULL 3.8728251
11:  C     1  NULL 0.7085305
12:  C     0  NULL 1.2525965
13:  C     2  NULL 1.8467385
14:  C     0  NULL 2.1358983
15:  C     0  NULL 2.2830119

我想为变量state赋予事件1和事件2之间行的值1.这两个事件每个id只发生一次,而event=1总是在event=2之前出现{ {1}}。

以下代码对上述data.table,

进行了说明
library(data.table)

# Defining variabels and data.table
id <- rep(LETTERS[1:3],each=5)
set.seed(123)
event <- c(sample(c(0,1),2,F),sample(c(0,0,2),3,F),
           sample(c(0,1),2,F),sample(c(0,0,2),3,F),
           sample(c(0,1),2,F),sample(c(0,0,2),3,F))
state <- "NULL"
time <- c(apply(matrix(runif(3*5),5,3),2,cumsum))
DT <- data.table(id,event,state,time) 
DT

我已尝试使用以下代码将值1分配给event==1event==2两个时间点之间的状态变量。

DT[time>=time[event==1] & time<=time[event==2],state:="1",by=id]

但这会产生以下输出:

    id event state      time
 1:  A     0  NULL 0.8998250
 2:  A     1  NULL 1.1459127
 3:  A     0     1 1.1879722
 4:  A     2     1 1.5158930
 5:  A     0  NULL 2.4703966
 6:  B     0     1 0.8895393
 7:  B     1  NULL 1.5823427
 8:  B     2     1 2.2228495
 9:  B     0  NULL 3.2171193
10:  B     0  NULL 3.8728251
11:  C     1  NULL 0.7085305
12:  C     0     1 1.2525965
13:  C     2  NULL 1.8467385
14:  C     0     1 2.1358983
15:  C     0  NULL 2.2830119

state=1明确放在data.table中的错误位置。我无法弄清楚data.table正在做什么。你能看出为什么data.table以这种方式运行并且我的问题有一个很好的解决方案吗?

2 个答案:

答案 0 :(得分:2)

你快到了,试试这个:

DT[,state:= ifelse(time>=time[event==1] & time<=time[event==2],1,state),by=id]

#    id event state      time
# 1:  A     0  NULL 0.8998250
# 2:  A     1     1 1.1459127
# 3:  A     0     1 1.1879722
# 4:  A     2     1 1.5158930
# 5:  A     0  NULL 2.4703966
# 6:  B     0  NULL 0.8895393
# 7:  B     1     1 1.5823427
# 8:  B     2     1 2.2228495
# 9:  B     0  NULL 3.2171193
#10:  B     0  NULL 3.8728251
#11:  C     1     1 0.7085305
#12:  C     0     1 1.2525965
#13:  C     2     1 1.8467385
#14:  C     0  NULL 2.1358983
#15:  C     0  NULL 2.2830119

答案 1 :(得分:2)

不使用ifelse,我们可以使用.I提取行索引,然后将state的行分配为'1'。

DT[DT[,.I[time>=time[event==1] & time<=time[event==2]], 
                                 by=id]$V1, state:='1'][]
#    id event state      time
# 1:  A     0  NULL 0.8998250
# 2:  A     1     1 1.1459127
# 3:  A     0     1 1.1879722
# 4:  A     2     1 1.5158930
# 5:  A     0  NULL 2.4703966
# 6:  B     0  NULL 0.8895393
# 7:  B     1     1 1.5823427
# 8:  B     2     1 2.2228495
# 9:  B     0  NULL 3.2171193
#10:  B     0  NULL 3.8728251
#11:  C     1     1 0.7085305
#12:  C     0     1 1.2525965
#13:  C     2     1 1.8467385
#14:  C     0  NULL 2.1358983
#15:  C     0  NULL 2.2830119