Question

我对R中的data.table有疑问我有这样的数据集

data <- data.table(a=c(1:7,12,32,13),b=c(1,5,6,7,8,3,2,5,1,4))

     a b
 1:  1 1
 2:  2 5
 3:  3 6
 4:  4 7
 5:  5 8
 6:  6 3
 7:  7 2
 8: 12 5
 9: 32 1
 10: 13 4

现在我想生成第三列c，它将a的每一行的值与b的所有先前值进行比较，并检查b的值是否大于a。例如，在第5行，a = 5，并且b的先前值是1,5,6,7。因此6和7大于5，因此c的值应为1，否则为0。结果应该是这样的

     a b  c
 1:  1 1 NA
 2:  2 5  0
 3:  3 6  1
 4:  4 7  1
 5:  5 8  1
 6:  6 3  1
 7:  7 2  1
 8: 12 5  0
 9: 32 1  0
10: 13 4  0

我尝试使用for循环，但需要很长时间。我也尝试过shift但是我不能用shift来引用多个先前的行。有人有什么建议吗？

Answer 1

library(data.table)
data <- data.table(a=c(1:7,12,32,13),b=c(1,5,6,7,8,3,2,5,1,4))
data[,c:= a <= shift(cummax(b))]

Answer 2

这是基础R解决方案（请参阅下面的dplyr解决方案）：

data$c = NA
data$c[2:nrow(data)] <- sapply(2:nrow(data), function(x) { data$c[x] <- any(data$a[x] < data$b[1:(x-1)]) } )

##      a b  c
##  1:  1 1 NA
##  2:  2 5  0
##  3:  3 6  1
##  4:  4 7  1
##  5:  5 8  1
##  6:  6 3  1
##  7:  7 2  1
##  8: 12 5  0
##  9: 32 1  0
## 10: 13 4  0

修改

以下是使用dplyr
的更简单的解决方案
library(dplyr) ### Given the cumulative max and comparing to 'a', set see to 1/0. data %>% mutate(c = ifelse(a < lag(cummax(b)), 1, 0)) ## a b c ## 1 1 1 NA ## 2 2 5 0 ## 3 3 6 1 ## 4 4 7 1 ## 5 5 8 1 ## 6 6 3 1 ## 7 7 2 1 ## 8 12 5 0 ## 9 32 1 0 ## 10 13 4 0 ### Using 'shift' with dplyr data %>% mutate(c = ifelse(a <= shift(cummax(b)), 1, 0))

如何引用R data.table中的多个先前行

2 个答案: