我有数据
library(data.table)
set.seed(42)
t <- data.table(time=1:1000, value=runif(100,0,1))
p <- data.table(id=1:10, cut=sample(1:100,5))
vals <- 1:5
> head(t)
time value
1: 1 0.9148060
2: 2 0.9370754
3: 3 0.2861395
4: 4 0.8304476
5: 5 0.6417455
6: 6 0.5190959
> head(p)
id cut
1: 1 63
2: 2 22
3: 3 99
4: 4 38
5: 5 91
6: 6 63
> vals
[1] 1 2 3 4 5
其中t
提供了一些与时间点相关联的values
向量,p
为每个人提供了time
的截止值。
我想为每个人获得积累vals
中每个值所需的时间单位。
我现在的方法是使用for循环,为每个人计算累积和的临时向量,从time
的特定截止值开始。接下来,我使用findInterval()
来获取cumsum到达vals
中每个级别的位置。
out <- matrix(NA, nrow=nrow(p), ncol=length(vals)); colnames(out) <- vals
for(i in 1:nrow(p)){
temp <- cumsum(t$value[t$time > p$cut[i]]); temp <- temp[!is.na(temp)]
out[i,] <- findInterval(vals,temp)
}
应该产生
1 2 3 4 5
[1,] 1 4 5 9 12
[2,] 1 2 5 6 7
[3,] 1 2 4 5 7
[4,] 1 3 5 7 8
[5,] 2 3 5 7 8
[6,] 1 4 5 9 12
[7,] 1 2 5 6 7
[8,] 1 2 4 5 7
[9,] 1 3 5 7 8
[10,] 2 3 5 7 8
这当然是非常低效的,并且不能公正地对待R的力量。有没有办法加速这个?
答案 0 :(得分:2)
我会做
# precompute cumsum on full table
t[, cs := cumsum(value)]
# compute one time per unique cut value, not per id
cuts = unique(p[, .(t_cut = cut)])
# look up value at cut time
cuts[t, on=.(t_cut = time), v_cut := i.cs]
# look up time at every cut value combo
cutres = cuts[, .(pt = vals + v_cut), by=t_cut][, .(
t_cut,
v = vals,
t_plus = t[.SD, on=.(cs = pt), roll=TRUE, x.time] - t_cut
)]
给出了
t_cut v t_plus
1: 63 1 1
2: 63 2 4
3: 63 3 5
4: 63 4 9
5: 63 5 12
6: 22 1 1
7: 22 2 2
8: 22 3 5
9: 22 4 6
10: 22 5 7
11: 99 1 1
12: 99 2 2
13: 99 3 4
14: 99 4 5
15: 99 5 7
16: 38 1 1
17: 38 2 3
18: 38 3 5
19: 38 4 7
20: 38 5 8
21: 91 1 2
22: 91 2 3
23: 91 3 5
24: 91 4 7
25: 91 5 8
t_cut v t_plus
如果你想将它映射回id
并将其放入id x vals表中......
cutres[p, on=.(t_cut = cut), allow.cartesian=TRUE,
dcast(.SD, id ~ v, value.var = "t_plus")]
id 1 2 3 4 5
1: 1 1 4 5 9 12
2: 2 1 2 5 6 7
3: 3 1 2 4 5 7
4: 4 1 3 5 7 8
5: 5 2 3 5 7 8
6: 6 1 4 5 9 12
7: 7 1 2 5 6 7
8: 8 1 2 4 5 7
9: 9 1 3 5 7 8
10: 10 2 3 5 7 8
(或者,关键部分可以像t_plus = t[.SD, on=.(cs = pt), roll=TRUE, which=TRUE] - t_cut
那样完成,因为t$time
是行号。)