R data.table计算直到达到值的行

时间:2015-11-24 22:24:11

标签: r data.table

我想在data.table中返回一个新列,该列显示向下的行数,直到达到低于当前值(Temp)的值。

library(data.table)
set.seed(123)
DT <- data.table( Temp = runif(10,0,20) )

这就是我希望它的样子:

set.seed(123)
DT <- data.table(
        Temp = runif(10,0,20),
        Day_Below_Temp = c("5","1","3","2","1","NA","3","1","1","NA")
)

3 个答案:

答案 0 :(得分:4)

在当前开发版本中使用新实现的非equi连接,可以通过以下简单方式完成:

require(data.table) # v1.9.7+
DT[, row := .I] # add row numbers
DT[DT, x.row-i.row, on = .(row > row, Temp < Temp), mult="first"]
# [1]  5  1  3  2  1 NA  3  1  1 NA

行号是必要的,因为我们需要找到低于当前索引的索引,因此需要成为连接中的条件。我们执行自联接,即,对于DT(内部)中的每一行,根据提供给on参数的条件,我们在{DT中找到第一个匹配的行索引{1}}(外部)。然后我们减去行索引以从当前行获取位置。 x.row指的是内部DT的外部i.rowDT的索引。

要获取devel版本,请参阅安装说明here

在1e5行:

set.seed(123)
DT <- data.table(Temp = runif(1e5L, 0L, 20L))

DT[, row := .I]
system.time({
    ans = DT[DT, x.row-i.row, on = .(row > row, Temp < Temp), mult="first", verbose=TRUE]
})
# Non-equi join operators detected ... 
#   forder took ... 0.001 secs
#   Generating non-equi group ids ... done in 0.452 secs
#   Recomputing forder with non-equi ids ... done in 0.001 secs
#   Found 623 non-equi group(s) ...
# Starting bmerge ...done in 8.118 secs
# Detected that j uses these columns: x.row,i.row 
#    user  system elapsed 
#   8.492   0.038   8.577 

head(ans)
# [1]  5  1  3  2  1 12
tail(ans)
# [1]  2  1  1  2  1 NA

答案 1 :(得分:2)

这是一个dplyr方法:

library(dplyr)
set.seed(123)
dt <- data.frame( Temp = runif(10,0,20) )
dt %>% mutate(Day_Below_Temp = 
                 sapply(1:length(Temp), function(x) min(which(.$Temp[x:length(.$Temp)] < .$Temp[x]))-1))

        Temp Day_Below_Temp
1   5.751550              5
2  15.766103              1
3   8.179538              3
4  17.660348              2
5  18.809346              1
6   0.911130            Inf
7  10.562110              3
8  17.848381              1
9  11.028700              1
10  9.132295            Inf

答案 2 :(得分:1)

这可以胜任 - 虽然不是很快

DT[, rowN := .I]

DT[, Day_Below_Temp := which(DT$Temp[rowN:nrow(DT)] < Temp)[1] - 1, 
   by = rowN
   ][, rowN := NULL]