R data.table将增加的值应用于特定行索引

时间:2018-04-12 07:24:14

标签: r data.table row difference

我的数据是这样的:

Time  |    State   |   Event
01    |    0       |        
02    |    0       |        
03    |    0       |        
04    |    2       |   A_start
05    |    2       |          
06    |    2       |          
07    |    2       |          
08    |    2       |          
09    |    1       |   A_end  
10    |    1       |          
11    |    1       |          
12    |    1       |          
13    |    1       |          
14    |    2       |   B_start
15    |    2       |          
16    |    2       |          
17    |    2       |          
18    |    2       |          
19    |    0       |   B_end  
20    |    0       |          
21    |    0       |          
22    |    0       |          
23    |    0       |          
24    |    2       |   A_start
25    |    2       |          
26    |    2       |          
27    |    2       |          
28    |    2       |          
29    |    2       |          
30    |    2       |          
31    |    1       |   A_end  
32    |    1       |          
33    |    1       |          
34    |    1       |          
35    |    1       |          
36    |    1       |          
37    |    2       |   B_start
38    |    2       |          
39    |    2       |          
40    |    2       |          

循环可以重复任意数量的0,1s和2s。有时,0s,1s或2s可能完全丢失。我希望在紧随其后的每个TimeA_start之间的A_end列中获得差异。同样,我希望紧随其后的每个TimeB_start之间的差距为B_end

为此,我认为如果我做了一个"小组"对于每个周期,如下:

Time  |    State   |   Event     |   Group
01    |    0       |             |
02    |    0       |             |
03    |    0       |             |
04    |    2       |   A_start   |   1
05    |    2       |             |
06    |    2       |             |
07    |    2       |             |
08    |    2       |             |
09    |    1       |   A_end     |   1
10    |    1       |             |
11    |    1       |             |
12    |    1       |             |
13    |    1       |             |
14    |    2       |   B_start   |   1
15    |    2       |             |
16    |    2       |             |
17    |    2       |             |
18    |    2       |             |
19    |    0       |   B_end     |   1
20    |    0       |             |
21    |    0       |             |
22    |    0       |             |
23    |    0       |             |
24    |    2       |   A_start   |   2
25    |    2       |             |
26    |    2       |             |
27    |    2       |             |
28    |    2       |             |
29    |    2       |             |
30    |    2       |             |
31    |    1       |   A_end     |   2
32    |    1       |             |
33    |    1       |             |
34    |    1       |             |
35    |    1       |             |
36    |    1       |             |
37    |    2       |   B_start   |   2
38    |    2       |             |
39    |    2       |             |
40    |    2       |             |

但是,由于State列中有时缺少值,因此效果不佳。

正确的循环序列为0 -> 2 -> 1 -> 2 -> 0。有时,一个周期可能会错过2,如下所示:0 -> 1 -> 2 -> 0。循环0 -> 2 -> 1 -> 2 -> 0的各种组合是可能的(总共44个)。我应该怎么做呢?

1 个答案:

答案 0 :(得分:1)

这是一个基本解决方案:

#identify the times where there is a change in the State
timeWithChanges <- which(abs(diff(dat$State)) > 0) + 1

#pivot those times into a m * 2 matrix
startEnd <- matrix(dat$Time[timeWithChanges], ncol=2, byrow=TRUE)

#calculate the time difference and label them as A, B
data.frame(AB=rep(c("A", "B"), nrow(startEnd)/2), 
    TimeDiff=startEnd[,2] - startEnd[,1])

请告诉我这是否适用于您。

数据:

dat <- read.table(text="Time  |    State
01    |    0
02    |    0
03    |    0
04    |    2
05    |    2
06    |    2
07    |    2
08    |    2
09    |    1
10    |    1
11    |    1
12    |    1
13    |    1
14    |    2
15    |    2
16    |    2
17    |    2
18    |    2
19    |    0
20    |    0
21    |    0
22    |    0
23    |    0
24    |    2
25    |    2
26    |    2
27    |    2
28    |    2
29    |    2
30    |    2
31    |    1
32    |    1
33    |    1
34    |    1
35    |    1
36    |    1
37    |    2
38    |    2
39    |    2
40    |    2
41    |    0", sep="|", header=TRUE)