我有一个简单的问题,但不知何故无法找出解决方案......
以下是数据集的示例:
dt = data.table(A=rep(c(1:2), each = 5), B = c(1,1,2,2,3,1,2,3,3,1), C =c("a","b","b","b","b","b","a","b","a","a"))
基本上,我想要一个计数器计数器变量,它只在满足条件时重复它的值。条件是A中的后续行应该相同,在B中它们应该不同,在C中它们应该是相同的。这是所需的输出:
dt = data.table(A=rep(c(1:2), each = 5), B = c(1,1,2,2,3,1,2,3,3,1), C =c("a","b","b","b","b","b","a","b","a","a"), counter = c(1,2,2,3,3,4,5,6,7,7))
如您所见,计数器变量仅在满足这些条件时重复它的值。
谢谢!
答案 0 :(得分:1)
在逻辑条件下使用cumsum
。
n <- nrow(dt)
dt$D <- c(1L, !c(dt$A[-n] == dt$A[-1] & dt$B[-n] != dt$B[-1] & dt$C[-n] == dt$C[-1]))
dt$D <- cumsum(dt$D)
答案 1 :(得分:0)
以下是使用shift
data.table
的选项
dt[, counter := cumsum(c(TRUE, !Reduce(`&`, c(Map(`==`, .SD,
shift(.SD)), list(B != shift(B))))[-1])), .SDcols = c(1, 3)]