我想在data.table中返回一个新列,该列显示向下的行数,直到达到低于当前值(Temp)的值。
library(data.table)
set.seed(123)
DT <- data.table( Temp = runif(10,0,20) )
这就是我希望它的样子:
set.seed(123)
DT <- data.table(
Temp = runif(10,0,20),
Day_Below_Temp = c("5","1","3","2","1","NA","3","1","1","NA")
)
答案 0 :(得分:4)
在当前开发版本中使用新实现的非equi连接,可以通过以下简单方式完成:
require(data.table) # v1.9.7+
DT[, row := .I] # add row numbers
DT[DT, x.row-i.row, on = .(row > row, Temp < Temp), mult="first"]
# [1] 5 1 3 2 1 NA 3 1 1 NA
行号是必要的,因为我们需要找到低于当前索引的索引,因此需要成为连接中的条件。我们执行自联接,即,对于DT
(内部)中的每一行,根据提供给on
参数的条件,我们在{DT
中找到第一个匹配的行索引{1}}(外部)。然后我们减去行索引以从当前行获取位置。 x.row
指的是内部DT
的外部i.row
和DT
的索引。
要获取devel版本,请参阅安装说明here。
在1e5行:
set.seed(123)
DT <- data.table(Temp = runif(1e5L, 0L, 20L))
DT[, row := .I]
system.time({
ans = DT[DT, x.row-i.row, on = .(row > row, Temp < Temp), mult="first", verbose=TRUE]
})
# Non-equi join operators detected ...
# forder took ... 0.001 secs
# Generating non-equi group ids ... done in 0.452 secs
# Recomputing forder with non-equi ids ... done in 0.001 secs
# Found 623 non-equi group(s) ...
# Starting bmerge ...done in 8.118 secs
# Detected that j uses these columns: x.row,i.row
# user system elapsed
# 8.492 0.038 8.577
head(ans)
# [1] 5 1 3 2 1 12
tail(ans)
# [1] 2 1 1 2 1 NA
答案 1 :(得分:2)
这是一个dplyr
方法:
library(dplyr)
set.seed(123)
dt <- data.frame( Temp = runif(10,0,20) )
dt %>% mutate(Day_Below_Temp =
sapply(1:length(Temp), function(x) min(which(.$Temp[x:length(.$Temp)] < .$Temp[x]))-1))
Temp Day_Below_Temp
1 5.751550 5
2 15.766103 1
3 8.179538 3
4 17.660348 2
5 18.809346 1
6 0.911130 Inf
7 10.562110 3
8 17.848381 1
9 11.028700 1
10 9.132295 Inf
答案 2 :(得分:1)
这可以胜任 - 虽然不是很快
DT[, rowN := .I]
DT[, Day_Below_Temp := which(DT$Temp[rowN:nrow(DT)] < Temp)[1] - 1,
by = rowN
][, rowN := NULL]