如何为每个(loc.x
,loc.y
)对找到最后值,在test.day之前?
dt <- data.table(
loc.x = as.integer(c(1, 1, 3, 1, 3, 1)),
loc.y = as.integer(c(1, 2, 1, 2, 1, 2)),
time = as.IDate(c("2015-03-11", "2015-05-10", "2015-09-27",
"2015-11-25", "2014-09-13", "2015-08-19")),
value = letters[1:6]
)
setkey(dt, loc.x, loc.y, time)
test.day <- as.IDate("2015-10-01")
必需的输出:
loc.x loc.y value
1: 1 1 a
2: 1 2 f
3: 3 1 c
答案 0 :(得分:6)
您可以先将行time < test.day
(这应该非常有效,因为它不是由组完成)进行子集化,然后选择每个组的最后value
。为此,您可以使用tail(value, 1L)
或按照Floo0的建议value[.N]
,结果:
dt[time < test.day, tail(value, 1L), by = .(loc.x, loc.y)]
# loc.x loc.y V1
#1: 1 1 a
#2: 1 2 f
#3: 3 1 c
或
dt[time < test.day, value[.N], by = .(loc.x, loc.y)]
请注意,这是有效的,因为数据是根据setkey(dt, loc.x, loc.y, time)
进行排序的。
答案 1 :(得分:6)
另一个选择是使用last
功能:
dt[, last(value[time < test.day]), by = .(loc.x, loc.y)]
给出:
loc.x loc.y V1
1: 1 1 a
2: 1 2 f
3: 3 1 c
答案 2 :(得分:5)
这是创建查找表后使用滚动连接的另一个选项
indx <- data.table(unique(dt[ ,.(loc.x, loc.y)]), time = test.day)
dt[indx, roll = TRUE, on = names(indx)]
# loc.x loc.y time value
# 1: 1 1 2015-10-01 a
# 2: 1 2 2015-10-01 f
# 3: 3 1 2015-10-01 c
或@eddi
建议的非常相似的选项dt[dt[, .(time = test.day), by = .(loc.x, loc.y)], roll = T, on = c('loc.x', 'loc.y', 'time')]
或者是一个效率较低的班轮,因为它将按组呼叫[.data.table
dt[,
.SD[data.table(test.day), value, roll = TRUE, on = c(time = "test.day")],
by = .(loc.x, loc.y)
]
# loc.x loc.y V1
# 1: 1 1 a
# 2: 1 2 f
# 3: 3 1 c