如何在ID中的j参数中组合滑动窗口?

时间:2019-03-31 18:36:28

标签: r data.table

这是我的data.table(称为“ sub”)的示例:

 time         fly         mm genotype Genotype
 1:    1 test 0.68000000  ctrl1_f   loxP_f
 2:    2 test 1.53394915  ctrl1_f   loxP_f
 3:    3 test 1.40431478  ctrl1_f   loxP_f
 4:    4 test 0.29154759  ctrl1_f   loxP_f
 5:    5 test 0.67416615  ctrl1_f   loxP_f
 6:    6 test 0.09848858  ctrl1_f   loxP_f
 7:    7 test 2.46099573  ctrl1_f   loxP_f
 8:    8 test 1.71143215  ctrl1_f   loxP_f
 9:    9 test 3.75767215  ctrl1_f   loxP_f
10:   10 test 5.99067609  ctrl1_f   loxP_f
11:   11 test 5.48714862  ctrl1_f   loxP_f
12:   12 test 0.00000000  ctrl1_f   loxP_f
13:   13 test 0.32015621  ctrl1_f   loxP_f
14:   14 test 0.75960516  ctrl1_f   loxP_f
15:   15 test 0.31953091  ctrl1_f   loxP_f
16:   16 test 0.67007462  ctrl1_f   loxP_f
17:   17 test 1.55467038  ctrl1_f   loxP_f
18:   18 test 1.13564959  ctrl1_f   loxP_f
19:   19 test 0.39051248  ctrl1_f   loxP_f
20:   20 test 1.31061054  ctrl1_f   loxP_f
21:   21 test 2.57007782  ctrl1_f   loxP_f
22:   22 test 1.78339564  ctrl1_f   loxP_f
23:   23 test 1.70484603  ctrl1_f   loxP_f
24:   24 test 0.63198101  ctrl1_f   loxP_f
25:   25 test 0.00000000  ctrl1_f   loxP_f
26:   26 real 1.74183811  ctrl1_f   loxP_f
27:   27 real 1.01000000  ctrl1_f   loxP_f
28:   28 real 0.85052925  ctrl1_f   loxP_f
29:   29 real 0.50000000  ctrl1_f   loxP_f
30:   30 real 0.56885851  ctrl1_f   loxP_f
31:   31 real 0.25000000  ctrl1_f   loxP_f
32:   32 real 0.46270941  ctrl1_f   loxP_f
33:   33 real 0.71000000  ctrl1_f   loxP_f
34:   34 real 0.30000000  ctrl1_f   loxP_f
35:   35 real 0.29410882  ctrl1_f   loxP_f
36:   36 real 0.65122961  ctrl1_f   loxP_f
37:   37 real 0.56435804  ctrl1_f   loxP_f
38:   38 real 1.37277092  ctrl1_f   loxP_f
39:   39 real 5.59322805  ctrl1_f   loxP_f
40:   40 real 3.15634282  ctrl1_f   loxP_f
41:   41 real 4.09078232  ctrl1_f   loxP_f
42:   42 real 2.02022276  ctrl1_f   loxP_f
43:   43 real 1.32196823  ctrl1_f   loxP_f
44:   44 real 1.98909527  ctrl1_f   loxP_f
45:   45 real 2.45985772  ctrl1_f   loxP_f
46:   46 real 3.61203544  ctrl1_f   loxP_f
47:   47 real 7.97250902  ctrl1_f   loxP_f
48:   48 real 3.05949342  ctrl1_f   loxP_f
49:   49 real 2.41754007  ctrl1_f   loxP_f
50:   50 real 1.27882759  ctrl1_f   loxP_f

我的目标是比较“ mm”列中的连续条目,并检查0后面是非0的地方,例如在上面代码的第12行中,再在第25行中。结果应保存在新列中。

当我这样做时,它会起作用:

sub[, initiate := lapply(1:(nrow(.SD) - 1), function(x) mm[x] == 0 && mm[x + 1] != 0)]

并生成以下data.table:

    time  fly         mm genotype Genotype initiate
 1:    1 test 0.68000000  ctrl1_f   loxP_f    FALSE
 2:    2 test 1.53394915  ctrl1_f   loxP_f    FALSE
 3:    3 test 1.40431478  ctrl1_f   loxP_f    FALSE
 4:    4 test 0.29154759  ctrl1_f   loxP_f    FALSE
 5:    5 test 0.67416615  ctrl1_f   loxP_f    FALSE
 6:    6 test 0.09848858  ctrl1_f   loxP_f    FALSE
 7:    7 test 2.46099573  ctrl1_f   loxP_f    FALSE
 8:    8 test 1.71143215  ctrl1_f   loxP_f    FALSE
 9:    9 test 3.75767215  ctrl1_f   loxP_f    FALSE
10:   10 test 5.99067609  ctrl1_f   loxP_f    FALSE
11:   11 test 5.48714862  ctrl1_f   loxP_f    FALSE
12:   12 test 0.00000000  ctrl1_f   loxP_f     TRUE
13:   13 test 0.32015621  ctrl1_f   loxP_f    FALSE
14:   14 test 0.75960516  ctrl1_f   loxP_f    FALSE
15:   15 test 0.31953091  ctrl1_f   loxP_f    FALSE
16:   16 test 0.67007462  ctrl1_f   loxP_f    FALSE
17:   17 test 1.55467038  ctrl1_f   loxP_f    FALSE
18:   18 test 1.13564959  ctrl1_f   loxP_f    FALSE
19:   19 test 0.39051248  ctrl1_f   loxP_f    FALSE
20:   20 test 1.31061054  ctrl1_f   loxP_f    FALSE
21:   21 test 2.57007782  ctrl1_f   loxP_f    FALSE
22:   22 test 1.78339564  ctrl1_f   loxP_f    FALSE
23:   23 test 1.70484603  ctrl1_f   loxP_f    FALSE
24:   24 test 0.63198101  ctrl1_f   loxP_f    FALSE
25:   25 test 0.00000000  ctrl1_f   loxP_f     TRUE
26:   26 real 1.74183811  ctrl1_f   loxP_f    FALSE
27:   27 real 1.01000000  ctrl1_f   loxP_f    FALSE
28:   28 real 0.85052925  ctrl1_f   loxP_f    FALSE
29:   29 real 0.50000000  ctrl1_f   loxP_f    FALSE
30:   30 real 0.56885851  ctrl1_f   loxP_f    FALSE
31:   31 real 0.25000000  ctrl1_f   loxP_f    FALSE
32:   32 real 0.46270941  ctrl1_f   loxP_f    FALSE
33:   33 real 0.71000000  ctrl1_f   loxP_f    FALSE
34:   34 real 0.30000000  ctrl1_f   loxP_f    FALSE
35:   35 real 0.29410882  ctrl1_f   loxP_f    FALSE
36:   36 real 0.65122961  ctrl1_f   loxP_f    FALSE
37:   37 real 0.56435804  ctrl1_f   loxP_f    FALSE
38:   38 real 1.37277092  ctrl1_f   loxP_f    FALSE
39:   39 real 5.59322805  ctrl1_f   loxP_f    FALSE
40:   40 real 3.15634282  ctrl1_f   loxP_f    FALSE
41:   41 real 4.09078232  ctrl1_f   loxP_f    FALSE
42:   42 real 2.02022276  ctrl1_f   loxP_f    FALSE
43:   43 real 1.32196823  ctrl1_f   loxP_f    FALSE
44:   44 real 1.98909527  ctrl1_f   loxP_f    FALSE
45:   45 real 2.45985772  ctrl1_f   loxP_f    FALSE
46:   46 real 3.61203544  ctrl1_f   loxP_f    FALSE
47:   47 real 7.97250902  ctrl1_f   loxP_f    FALSE
48:   48 real 3.05949342  ctrl1_f   loxP_f    FALSE
49:   49 real 2.41754007  ctrl1_f   loxP_f    FALSE
50:   50 real 1.27882759  ctrl1_f   loxP_f    FALSE

这样,我正确地将第12行和第25行标识为“初始化”,即标识为0后跟非0的行。

我无法解决的问题是:我想按“ fly”列进行相同的操作,即分别对“ test”和“ real”条目进行分组。当我这样做时,它不会给我正确的答案:

sub[, initiate2 := lapply(1:(nrow(.SD) - 1), function(x) mm[x] == 0 && mm[x + 1] != 0), by = fly]

sub

    time  fly         mm genotype Genotype initiate initiate2
 1:    1 test 0.68000000  ctrl1_f   loxP_f    FALSE     FALSE
 2:    2 test 1.53394915  ctrl1_f   loxP_f    FALSE     FALSE
 3:    3 test 1.40431478  ctrl1_f   loxP_f    FALSE     FALSE
 4:    4 test 0.29154759  ctrl1_f   loxP_f    FALSE     FALSE
 5:    5 test 0.67416615  ctrl1_f   loxP_f    FALSE     FALSE
 6:    6 test 0.09848858  ctrl1_f   loxP_f    FALSE     FALSE
 7:    7 test 2.46099573  ctrl1_f   loxP_f    FALSE     FALSE
 8:    8 test 1.71143215  ctrl1_f   loxP_f    FALSE     FALSE
 9:    9 test 3.75767215  ctrl1_f   loxP_f    FALSE     FALSE
10:   10 test 5.99067609  ctrl1_f   loxP_f    FALSE     FALSE
11:   11 test 5.48714862  ctrl1_f   loxP_f    FALSE     FALSE
12:   12 test 0.00000000  ctrl1_f   loxP_f     TRUE     FALSE
13:   13 test 0.32015621  ctrl1_f   loxP_f    FALSE     FALSE
14:   14 test 0.75960516  ctrl1_f   loxP_f    FALSE     FALSE
15:   15 test 0.31953091  ctrl1_f   loxP_f    FALSE     FALSE
16:   16 test 0.67007462  ctrl1_f   loxP_f    FALSE     FALSE
17:   17 test 1.55467038  ctrl1_f   loxP_f    FALSE     FALSE
18:   18 test 1.13564959  ctrl1_f   loxP_f    FALSE     FALSE
19:   19 test 0.39051248  ctrl1_f   loxP_f    FALSE     FALSE
20:   20 test 1.31061054  ctrl1_f   loxP_f    FALSE     FALSE
21:   21 test 2.57007782  ctrl1_f   loxP_f    FALSE     FALSE
22:   22 test 1.78339564  ctrl1_f   loxP_f    FALSE     FALSE
23:   23 test 1.70484603  ctrl1_f   loxP_f    FALSE     FALSE
24:   24 test 0.63198101  ctrl1_f   loxP_f    FALSE     FALSE
25:   25 test 0.00000000  ctrl1_f   loxP_f     TRUE     FALSE
26:   26 real 1.74183811  ctrl1_f   loxP_f    FALSE     FALSE
27:   27 real 1.01000000  ctrl1_f   loxP_f    FALSE     FALSE
28:   28 real 0.85052925  ctrl1_f   loxP_f    FALSE     FALSE
29:   29 real 0.50000000  ctrl1_f   loxP_f    FALSE     FALSE
30:   30 real 0.56885851  ctrl1_f   loxP_f    FALSE     FALSE
31:   31 real 0.25000000  ctrl1_f   loxP_f    FALSE     FALSE
32:   32 real 0.46270941  ctrl1_f   loxP_f    FALSE     FALSE
33:   33 real 0.71000000  ctrl1_f   loxP_f    FALSE     FALSE
34:   34 real 0.30000000  ctrl1_f   loxP_f    FALSE     FALSE
35:   35 real 0.29410882  ctrl1_f   loxP_f    FALSE     FALSE
36:   36 real 0.65122961  ctrl1_f   loxP_f    FALSE     FALSE
37:   37 real 0.56435804  ctrl1_f   loxP_f    FALSE     FALSE
38:   38 real 1.37277092  ctrl1_f   loxP_f    FALSE     FALSE
39:   39 real 5.59322805  ctrl1_f   loxP_f    FALSE     FALSE
40:   40 real 3.15634282  ctrl1_f   loxP_f    FALSE     FALSE
41:   41 real 4.09078232  ctrl1_f   loxP_f    FALSE     FALSE
42:   42 real 2.02022276  ctrl1_f   loxP_f    FALSE     FALSE
43:   43 real 1.32196823  ctrl1_f   loxP_f    FALSE     FALSE
44:   44 real 1.98909527  ctrl1_f   loxP_f    FALSE     FALSE
45:   45 real 2.45985772  ctrl1_f   loxP_f    FALSE     FALSE
46:   46 real 3.61203544  ctrl1_f   loxP_f    FALSE     FALSE
47:   47 real 7.97250902  ctrl1_f   loxP_f    FALSE     FALSE
48:   48 real 3.05949342  ctrl1_f   loxP_f    FALSE     FALSE
49:   49 real 2.41754007  ctrl1_f   loxP_f    FALSE     FALSE
50:   50 real 1.27882759  ctrl1_f   loxP_f    FALSE     FALSE

initiate2在第12行和第25行都给我FALSE,尽管对于第12行应该给我TRUE,而对于第25行应该给我FALSE。为什么会发生这种情况,我该如何解决?我是data.table的新手,所以可能有些明显的东西我不见了。

1 个答案:

答案 0 :(得分:2)

您可以使用rle,并根据条件“ 0后跟非0”替换values,即values == 0 & lengths == 1,然后调用inverse.rle

DT[, initiate := {
  r <- rle(mm)
  r$values <- r$values == 0 & r$lengths == 1
  inverse.rle(r)
}, by = id][]
#    mm id initiate
# 1:  0  a     TRUE
# 2:  1  a    FALSE
# 3:  1  a    FALSE
# 4:  2  a    FALSE
# 5:  0  a     TRUE
# 6:  2  a    FALSE
# 7:  2  a    FALSE
# 8:  1  a    FALSE
# 9:  1  a    FALSE
#10:  0  a     TRUE
#11:  0  b    FALSE
#12:  0  b    FALSE
#13:  2  b    FALSE
#14:  1  b    FALSE
#15:  2  b    FALSE
#16:  1  b    FALSE
#17:  2  b    FALSE
#18:  2  b    FALSE
#19:  1  b    FALSE
#20:  2  b    FALSE

数据

library(data.table)
set.seed(1)
DT <- data.table(mm = sample(0:2, 20, TRUE),
                 id = rep(letters[1:2], each = 10))
DT
#    mm id
# 1:  0  a
# 2:  1  a
# 3:  1  a
# 4:  2  a
# 5:  0  a
# 6:  2  a
# 7:  2  a
# 8:  1  a
# 9:  1  a
#10:  0  a
#11:  0  b
#12:  0  b
#13:  2  b
#14:  1  b
#15:  2  b
#16:  1  b
#17:  2  b
#18:  2  b
#19:  1  b
#20:  2  b