如何创建一个依赖于另一个变量的所有先前值的虚拟对象,其中先前值的数量是任意的?
我的数据如下:
library(data.table)
dt <- data.table(from = as.Date(c("20020101", "20030101", "20040101", "20050101",
"20010101", "20020101", "20030101", "20040101", "20050101"), "%Y%m%d"),
to = as.Date(c("20031231", "20041231", "20051231", "20061231",
"20021231", "20031231", "20041231", "20051231", "20061231"), "%Y%m%d"),
id = as.factor(c(1, 1, 1, 1, 2, 2, 2, 2, 2)),
cond = c(F, F, T, F, F, T, T, T, F))
> dt
from to id cond
1: 2002-01-01 2003-12-31 1 FALSE
2: 2003-01-01 2004-12-31 1 FALSE
3: 2004-01-01 2005-12-31 1 TRUE
4: 2005-01-01 2006-12-31 1 FALSE
5: 2001-01-01 2002-12-31 2 FALSE
6: 2002-01-01 2003-12-31 2 TRUE
7: 2003-01-01 2004-12-31 2 TRUE
8: 2004-01-01 2005-12-31 2 TRUE
9: 2005-01-01 2006-12-31 2 FALSE
我需要做的是创建一个假人dum = 1
cond == TRUE
s <= t
dum = 0
cond == FALSE
s <= t
0 from to id cond dum
1: 2002-01-01 2003-12-31 1 FALSE 0
2: 2003-01-01 2004-12-31 1 FALSE 0
3: 2004-01-01 2005-12-31 1 TRUE 1
4: 2005-01-01 2006-12-31 1 FALSE 1
5: 2001-01-01 2002-12-31 2 FALSE 0
6: 2002-01-01 2003-12-31 2 TRUE 1
7: 2003-01-01 2004-12-31 2 TRUE 1
8: 2004-01-01 2005-12-31 2 TRUE 1
9: 2005-01-01 2006-12-31 2 FALSE 1
0 N
}。
id
我试图使用延迟,即为每个N
创建i
滞后,其中i
是我活着的句号数量,但是,因为个人不是&# 39;在一定数量的时期内,这种做法过于混乱。
这是我试图开发的代码,当所有dt <- dt[1:8, ]
dum <- c()
# Iterate through all unique IDs
for(i in unique(dt$id)){
# Subset the data
dt.tmp <- dt[id == i, ]
N <- nrow(dt.tmp)-1
nm <- paste("lag.cond", 1:N, sep = "")
# iterate through all periods and lag cond
for(j in 1:N){
dt.tmp[, (nm[j]) := shift(.SD, n = j), by = id, .SDcols = "cond"]
}
# If any of the lags are == TRUE => set dum to 1
dt.tmp[, dum := ifelse(cond | lag.cond1 | lag.cond2 | lag.cond3, 1, 0)]
dt.tmp[is.na(dum), dum := 0]
dum <- append(dum, dt.tmp$dum)
}
dt[, dum := dum]
dt
s在相同的时间段内都存活时(即所有myserverIP/~hostingusername/website/
s活着4个周期)
www.domain.com --> mysite.squarespace.com
subdomain.domain.com --> myserverIP/~hostingusername/website/
答案 0 :(得分:2)
假设cond == TRUE的一个实例触发dum = 1,你可以使用一个鲜为人知的cumsum
朋友,cummax
返回变量的累积最大值:
dt[, dum := cummax(as.integer(cond)), by="id"]
或使用:=
dt[, `:=`(dum=cummax(as.integer(cond))), by="id"]
正如@Frank指出的那样,您不需要as.integer
函数,因为cummax
会将逻辑cond强制转换为整数。这会将您的代码缩短为
dt[, dum := cummax(cond), by="id"]
在第一个版本中增加了可读性。