R data.table条件(最小/最大)聚合

时间:2015-07-24 16:06:01

标签: r data.table

我是R的新手,我有一个关于如何使用data.tables(或其他方法)进行条件聚合的问题,同时仍然通过引用访问表列。对类似的问题here有一个答案,但是我的数据需要很长时间并占用大量内存。这是一些玩具数据:

t <- data.table(User=c(1,1,1,1,1,2,2,2,2,3,3,3,3,3,3),
  Obs=c(1,2,3,4,5,1,2,3,4,1,2,3,4,5,6),
  Flag=c(0,1,0,1,0,0,1,0,0,1,0,0,0,1,0))

看起来像这样:

    User Obs Flag
1:     1   1    0
2:     1   2    1
3:     1   3    0
4:     1   4    1
5:     1   5    0
6:     2   1    0
7:     2   2    1
8:     2   3    0
9:     2   4    0
10:    3   1    1
11:    3   2    0
12:    3   3    0
13:    3   4    0
14:    3   5    1
15:    3   6    0

我想要做的是用户获得的最大观察值小于标志为1的当前观察值。输出应如下所示:

    User Obs Flag min.max
1:     1   1    0     NA
2:     1   2    1      2
3:     1   3    0      2
4:     1   4    1      4
5:     1   5    0      4
6:     2   1    0     NA
7:     2   2    1      2
8:     2   3    0      2
9:     2   4    0      2
10:    3   1    1      1
11:    3   2    0      1
12:    3   3    0      1
13:    3   4    0      1
14:    3   5    1      5
15:    3   6    0      5

非常感谢任何帮助!

1 个答案:

答案 0 :(得分:3)

t[, max := Obs[Flag == 1], by = .(User, cumsum(diff(c(0, Flag)) == 1))]
t
#    User Obs Flag max
# 1:    1   1    0  NA
# 2:    1   2    1   2
# 3:    1   3    0   2
# 4:    1   4    1   4
# 5:    1   5    0   4
# 6:    2   1    0  NA
# 7:    2   2    1   2
# 8:    2   3    0   2
# 9:    2   4    0   2
#10:    3   1    1   1
#11:    3   2    0   1
#12:    3   3    0   1
#13:    3   4    0   1
#14:    3   5    1   5
#15:    3   6    0   5