这是我的data.table(称为“ sub”)的示例:
time fly mm genotype Genotype
1: 1 test 0.68000000 ctrl1_f loxP_f
2: 2 test 1.53394915 ctrl1_f loxP_f
3: 3 test 1.40431478 ctrl1_f loxP_f
4: 4 test 0.29154759 ctrl1_f loxP_f
5: 5 test 0.67416615 ctrl1_f loxP_f
6: 6 test 0.09848858 ctrl1_f loxP_f
7: 7 test 2.46099573 ctrl1_f loxP_f
8: 8 test 1.71143215 ctrl1_f loxP_f
9: 9 test 3.75767215 ctrl1_f loxP_f
10: 10 test 5.99067609 ctrl1_f loxP_f
11: 11 test 5.48714862 ctrl1_f loxP_f
12: 12 test 0.00000000 ctrl1_f loxP_f
13: 13 test 0.32015621 ctrl1_f loxP_f
14: 14 test 0.75960516 ctrl1_f loxP_f
15: 15 test 0.31953091 ctrl1_f loxP_f
16: 16 test 0.67007462 ctrl1_f loxP_f
17: 17 test 1.55467038 ctrl1_f loxP_f
18: 18 test 1.13564959 ctrl1_f loxP_f
19: 19 test 0.39051248 ctrl1_f loxP_f
20: 20 test 1.31061054 ctrl1_f loxP_f
21: 21 test 2.57007782 ctrl1_f loxP_f
22: 22 test 1.78339564 ctrl1_f loxP_f
23: 23 test 1.70484603 ctrl1_f loxP_f
24: 24 test 0.63198101 ctrl1_f loxP_f
25: 25 test 0.00000000 ctrl1_f loxP_f
26: 26 real 1.74183811 ctrl1_f loxP_f
27: 27 real 1.01000000 ctrl1_f loxP_f
28: 28 real 0.85052925 ctrl1_f loxP_f
29: 29 real 0.50000000 ctrl1_f loxP_f
30: 30 real 0.56885851 ctrl1_f loxP_f
31: 31 real 0.25000000 ctrl1_f loxP_f
32: 32 real 0.46270941 ctrl1_f loxP_f
33: 33 real 0.71000000 ctrl1_f loxP_f
34: 34 real 0.30000000 ctrl1_f loxP_f
35: 35 real 0.29410882 ctrl1_f loxP_f
36: 36 real 0.65122961 ctrl1_f loxP_f
37: 37 real 0.56435804 ctrl1_f loxP_f
38: 38 real 1.37277092 ctrl1_f loxP_f
39: 39 real 5.59322805 ctrl1_f loxP_f
40: 40 real 3.15634282 ctrl1_f loxP_f
41: 41 real 4.09078232 ctrl1_f loxP_f
42: 42 real 2.02022276 ctrl1_f loxP_f
43: 43 real 1.32196823 ctrl1_f loxP_f
44: 44 real 1.98909527 ctrl1_f loxP_f
45: 45 real 2.45985772 ctrl1_f loxP_f
46: 46 real 3.61203544 ctrl1_f loxP_f
47: 47 real 7.97250902 ctrl1_f loxP_f
48: 48 real 3.05949342 ctrl1_f loxP_f
49: 49 real 2.41754007 ctrl1_f loxP_f
50: 50 real 1.27882759 ctrl1_f loxP_f
我的目标是比较“ mm”列中的连续条目,并检查0后面是非0的地方,例如在上面代码的第12行中,再在第25行中。结果应保存在新列中。
当我这样做时,它会起作用:
sub[, initiate := lapply(1:(nrow(.SD) - 1), function(x) mm[x] == 0 && mm[x + 1] != 0)]
并生成以下data.table:
time fly mm genotype Genotype initiate
1: 1 test 0.68000000 ctrl1_f loxP_f FALSE
2: 2 test 1.53394915 ctrl1_f loxP_f FALSE
3: 3 test 1.40431478 ctrl1_f loxP_f FALSE
4: 4 test 0.29154759 ctrl1_f loxP_f FALSE
5: 5 test 0.67416615 ctrl1_f loxP_f FALSE
6: 6 test 0.09848858 ctrl1_f loxP_f FALSE
7: 7 test 2.46099573 ctrl1_f loxP_f FALSE
8: 8 test 1.71143215 ctrl1_f loxP_f FALSE
9: 9 test 3.75767215 ctrl1_f loxP_f FALSE
10: 10 test 5.99067609 ctrl1_f loxP_f FALSE
11: 11 test 5.48714862 ctrl1_f loxP_f FALSE
12: 12 test 0.00000000 ctrl1_f loxP_f TRUE
13: 13 test 0.32015621 ctrl1_f loxP_f FALSE
14: 14 test 0.75960516 ctrl1_f loxP_f FALSE
15: 15 test 0.31953091 ctrl1_f loxP_f FALSE
16: 16 test 0.67007462 ctrl1_f loxP_f FALSE
17: 17 test 1.55467038 ctrl1_f loxP_f FALSE
18: 18 test 1.13564959 ctrl1_f loxP_f FALSE
19: 19 test 0.39051248 ctrl1_f loxP_f FALSE
20: 20 test 1.31061054 ctrl1_f loxP_f FALSE
21: 21 test 2.57007782 ctrl1_f loxP_f FALSE
22: 22 test 1.78339564 ctrl1_f loxP_f FALSE
23: 23 test 1.70484603 ctrl1_f loxP_f FALSE
24: 24 test 0.63198101 ctrl1_f loxP_f FALSE
25: 25 test 0.00000000 ctrl1_f loxP_f TRUE
26: 26 real 1.74183811 ctrl1_f loxP_f FALSE
27: 27 real 1.01000000 ctrl1_f loxP_f FALSE
28: 28 real 0.85052925 ctrl1_f loxP_f FALSE
29: 29 real 0.50000000 ctrl1_f loxP_f FALSE
30: 30 real 0.56885851 ctrl1_f loxP_f FALSE
31: 31 real 0.25000000 ctrl1_f loxP_f FALSE
32: 32 real 0.46270941 ctrl1_f loxP_f FALSE
33: 33 real 0.71000000 ctrl1_f loxP_f FALSE
34: 34 real 0.30000000 ctrl1_f loxP_f FALSE
35: 35 real 0.29410882 ctrl1_f loxP_f FALSE
36: 36 real 0.65122961 ctrl1_f loxP_f FALSE
37: 37 real 0.56435804 ctrl1_f loxP_f FALSE
38: 38 real 1.37277092 ctrl1_f loxP_f FALSE
39: 39 real 5.59322805 ctrl1_f loxP_f FALSE
40: 40 real 3.15634282 ctrl1_f loxP_f FALSE
41: 41 real 4.09078232 ctrl1_f loxP_f FALSE
42: 42 real 2.02022276 ctrl1_f loxP_f FALSE
43: 43 real 1.32196823 ctrl1_f loxP_f FALSE
44: 44 real 1.98909527 ctrl1_f loxP_f FALSE
45: 45 real 2.45985772 ctrl1_f loxP_f FALSE
46: 46 real 3.61203544 ctrl1_f loxP_f FALSE
47: 47 real 7.97250902 ctrl1_f loxP_f FALSE
48: 48 real 3.05949342 ctrl1_f loxP_f FALSE
49: 49 real 2.41754007 ctrl1_f loxP_f FALSE
50: 50 real 1.27882759 ctrl1_f loxP_f FALSE
这样,我正确地将第12行和第25行标识为“初始化”,即标识为0后跟非0的行。
我无法解决的问题是:我想按“ fly”列进行相同的操作,即分别对“ test”和“ real”条目进行分组。当我这样做时,它不会给我正确的答案:
sub[, initiate2 := lapply(1:(nrow(.SD) - 1), function(x) mm[x] == 0 && mm[x + 1] != 0), by = fly]
sub
time fly mm genotype Genotype initiate initiate2
1: 1 test 0.68000000 ctrl1_f loxP_f FALSE FALSE
2: 2 test 1.53394915 ctrl1_f loxP_f FALSE FALSE
3: 3 test 1.40431478 ctrl1_f loxP_f FALSE FALSE
4: 4 test 0.29154759 ctrl1_f loxP_f FALSE FALSE
5: 5 test 0.67416615 ctrl1_f loxP_f FALSE FALSE
6: 6 test 0.09848858 ctrl1_f loxP_f FALSE FALSE
7: 7 test 2.46099573 ctrl1_f loxP_f FALSE FALSE
8: 8 test 1.71143215 ctrl1_f loxP_f FALSE FALSE
9: 9 test 3.75767215 ctrl1_f loxP_f FALSE FALSE
10: 10 test 5.99067609 ctrl1_f loxP_f FALSE FALSE
11: 11 test 5.48714862 ctrl1_f loxP_f FALSE FALSE
12: 12 test 0.00000000 ctrl1_f loxP_f TRUE FALSE
13: 13 test 0.32015621 ctrl1_f loxP_f FALSE FALSE
14: 14 test 0.75960516 ctrl1_f loxP_f FALSE FALSE
15: 15 test 0.31953091 ctrl1_f loxP_f FALSE FALSE
16: 16 test 0.67007462 ctrl1_f loxP_f FALSE FALSE
17: 17 test 1.55467038 ctrl1_f loxP_f FALSE FALSE
18: 18 test 1.13564959 ctrl1_f loxP_f FALSE FALSE
19: 19 test 0.39051248 ctrl1_f loxP_f FALSE FALSE
20: 20 test 1.31061054 ctrl1_f loxP_f FALSE FALSE
21: 21 test 2.57007782 ctrl1_f loxP_f FALSE FALSE
22: 22 test 1.78339564 ctrl1_f loxP_f FALSE FALSE
23: 23 test 1.70484603 ctrl1_f loxP_f FALSE FALSE
24: 24 test 0.63198101 ctrl1_f loxP_f FALSE FALSE
25: 25 test 0.00000000 ctrl1_f loxP_f TRUE FALSE
26: 26 real 1.74183811 ctrl1_f loxP_f FALSE FALSE
27: 27 real 1.01000000 ctrl1_f loxP_f FALSE FALSE
28: 28 real 0.85052925 ctrl1_f loxP_f FALSE FALSE
29: 29 real 0.50000000 ctrl1_f loxP_f FALSE FALSE
30: 30 real 0.56885851 ctrl1_f loxP_f FALSE FALSE
31: 31 real 0.25000000 ctrl1_f loxP_f FALSE FALSE
32: 32 real 0.46270941 ctrl1_f loxP_f FALSE FALSE
33: 33 real 0.71000000 ctrl1_f loxP_f FALSE FALSE
34: 34 real 0.30000000 ctrl1_f loxP_f FALSE FALSE
35: 35 real 0.29410882 ctrl1_f loxP_f FALSE FALSE
36: 36 real 0.65122961 ctrl1_f loxP_f FALSE FALSE
37: 37 real 0.56435804 ctrl1_f loxP_f FALSE FALSE
38: 38 real 1.37277092 ctrl1_f loxP_f FALSE FALSE
39: 39 real 5.59322805 ctrl1_f loxP_f FALSE FALSE
40: 40 real 3.15634282 ctrl1_f loxP_f FALSE FALSE
41: 41 real 4.09078232 ctrl1_f loxP_f FALSE FALSE
42: 42 real 2.02022276 ctrl1_f loxP_f FALSE FALSE
43: 43 real 1.32196823 ctrl1_f loxP_f FALSE FALSE
44: 44 real 1.98909527 ctrl1_f loxP_f FALSE FALSE
45: 45 real 2.45985772 ctrl1_f loxP_f FALSE FALSE
46: 46 real 3.61203544 ctrl1_f loxP_f FALSE FALSE
47: 47 real 7.97250902 ctrl1_f loxP_f FALSE FALSE
48: 48 real 3.05949342 ctrl1_f loxP_f FALSE FALSE
49: 49 real 2.41754007 ctrl1_f loxP_f FALSE FALSE
50: 50 real 1.27882759 ctrl1_f loxP_f FALSE FALSE
initiate2在第12行和第25行都给我FALSE,尽管对于第12行应该给我TRUE,而对于第25行应该给我FALSE。为什么会发生这种情况,我该如何解决?我是data.table的新手,所以可能有些明显的东西我不见了。
答案 0 :(得分:2)
您可以使用rle
,并根据条件“ 0后跟非0”替换values
,即values == 0 & lengths == 1
,然后调用inverse.rle
DT[, initiate := {
r <- rle(mm)
r$values <- r$values == 0 & r$lengths == 1
inverse.rle(r)
}, by = id][]
# mm id initiate
# 1: 0 a TRUE
# 2: 1 a FALSE
# 3: 1 a FALSE
# 4: 2 a FALSE
# 5: 0 a TRUE
# 6: 2 a FALSE
# 7: 2 a FALSE
# 8: 1 a FALSE
# 9: 1 a FALSE
#10: 0 a TRUE
#11: 0 b FALSE
#12: 0 b FALSE
#13: 2 b FALSE
#14: 1 b FALSE
#15: 2 b FALSE
#16: 1 b FALSE
#17: 2 b FALSE
#18: 2 b FALSE
#19: 1 b FALSE
#20: 2 b FALSE
数据
library(data.table)
set.seed(1)
DT <- data.table(mm = sample(0:2, 20, TRUE),
id = rep(letters[1:2], each = 10))
DT
# mm id
# 1: 0 a
# 2: 1 a
# 3: 1 a
# 4: 2 a
# 5: 0 a
# 6: 2 a
# 7: 2 a
# 8: 1 a
# 9: 1 a
#10: 0 a
#11: 0 b
#12: 0 b
#13: 2 b
#14: 1 b
#15: 2 b
#16: 1 b
#17: 2 b
#18: 2 b
#19: 1 b
#20: 2 b