根据data.table中的多个条件连续计数?

时间:2017-03-21 13:47:55

标签: r data.table

我的数据看起来像这样

     order val1 val2 win
 1:     1  8.5  6.0  NA
 2:     2  7.0  5.0  NA
 3:     3  6.0  5.0  NA
 4:     4  6.0  5.0  NA
 5:     5  6.0  5.0  NA
 6:     6  8.0  7.0  NA
 7:     7  5.0  4.0  NA
 8:     8  5.0  4.0  NA
 9:     9  5.0  3.0  NA
10:    10  7.0  2.0  NA
11:    11  4.0  3.0  NA
12:    12  4.0  3.0  NA
13:    13  3.0  2.5  NA
14:    14  6.0  5.0  NA
15:    15  3.0  2.5   1
16:    16  2.0  1.0  NA
17:    17  5.0  3.5  NA
18:    18  3.0  2.7  NA
19:    19  2.5  1.7  NA
........ etc ..........

我所挣扎的是基本上创建一个新列,它在win==1之后和之后开始计数。在val1之上添加的内容必须低于上一行的val2。只要val1低于val2,就会一直计算,如果不是,则跳过该标准,直到总共七次为止。像这样:

     order val1 val2 win cond_win
14:    14  6.0  5.0  NA       NA
15:    15  3.0  2.5   1       NA
16:    16  2.0  1.0  NA        1
17:    17  5.0  3.5  NA       NA
18:    18  3.0  2.7  NA        2
19:    19  2.5  1.7  NA        3
20:    20  1.5  1.3  NA        4
21:    21  1.2  0.5   1        5
22:    22  6.0  5.5  NA       NA
23:    23  5.0  4.5  NA        6
24:    24  4.0  3.5  NA        7
25:    25  3.0  2.5  NA       NA
26:    26  2.0  1.5  NA       NA

这是我希望它重置并基本上再次开始寻找。 目前正在与跳绳部分斗争。

循环解决方案是我试图采取的路线,但也会太慢。

data.table中是否有任何可能更快的优雅解决方案?

这是一些数据,以及我到目前为止所提出的数据。

DT <- data.table(order=seq(1,50,1),
             val1=c(8.5,7,6,6,6,8,5,5,5,7,4,4,3,6,3,2,5,3,2.5,1.5,1.2,6,5,4,3,2),
             val2=c(6,5,5,5,5,7,4,4,3,2,3,3,2.5,5,2.5,1,3.5,2.7,1.7,1.3,0.5,5.5,4.5,3.5,2.5,1.5),
             win=c(NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,1,NA,NA,NA,NA,NA,1,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,1,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,1))

# find the first
DT[win==1 & val1 < shift(val2,1),cond_win:=1]

# attempt at looping
for(i in 1:7){
 DT[shift(cond_win,1)==i & val1 < shift(val2,1),cond_win:=i+1]
}
DT

2 个答案:

答案 0 :(得分:1)

OP明确指出,计算行数需要满足四个条件

  1. 仅在win == 1。
  2. 之后开始计数
  3. 仅计算val1低于上一行val2的行。
  4. 继续计算有效行,直到达到七行。
  5. 当七点到达时重新开始。
  6. 一旦计数开始,win == 1的任何中间出现都重新开始计数。

    这里的困难是win == 1和val1 i 的不规则性&lt; val2 i-1 模式和条件的相互依赖性。所以,这个问题不能完全矢量化。我们仍然需要在win == 1:

    的外观上进行循环
    # find all appearances of win == 1, remember row number of next row,
    # ensure start is a valid row number (no overrun in case last row has win == 1)
    start <- DT[win == 1 & order < .N, order + 1]
    
    DT[, cond_val := val1 < shift(val2, fill = FALSE) ]
    DT[order >= first(start), cond_win := cumsum(cond_val)]
    
    # implied loop over all appearances of win == 1
    dummy <- lapply(start, function(start) {
      if (DT[start, cond_win > 7]) {
        # restart count from this row
        DT[order >= start, cond_win := cumsum(cond_val)]
      }
    })
    
    # rows which don't satisfy the conditions become NA
    DT[!cond_win %between% c(1,7) | !cond_val, cond_win := NA]
    DT
    

    结果如下:

        order val1 val2 win cond_val cond_win
     1:     1  8.5  6.0  NA    FALSE       NA
     2:     2  7.0  5.0  NA    FALSE       NA
     3:     3  6.0  5.0  NA    FALSE       NA
     4:     4  6.0  5.0  NA    FALSE       NA
     5:     5  6.0  5.0  NA    FALSE       NA
     6:     6  8.0  7.0  NA    FALSE       NA
     7:     7  5.0  4.0  NA     TRUE       NA
     8:     8  5.0  4.0  NA    FALSE       NA
     9:     9  5.0  3.0  NA    FALSE       NA
    10:    10  7.0  2.0  NA    FALSE       NA
    11:    11  4.0  3.0  NA    FALSE       NA
    12:    12  4.0  3.0  NA    FALSE       NA
    13:    13  3.0  2.5  NA    FALSE       NA
    14:    14  6.0  5.0  NA    FALSE       NA
    15:    15  3.0  2.5   1     TRUE       NA
    16:    16  2.0  1.0  NA     TRUE        1
    17:    17  5.0  3.5  NA    FALSE       NA
    18:    18  3.0  2.7  NA     TRUE        2
    19:    19  2.5  1.7  NA     TRUE        3
    20:    20  1.5  1.3  NA     TRUE        4
    21:    21  1.2  0.5   1     TRUE        5
    22:    22  6.0  5.5  NA    FALSE       NA
    23:    23  5.0  4.5  NA     TRUE        6
    24:    24  4.0  3.5  NA     TRUE        7
    25:    25  3.0  2.5  NA     TRUE       NA
    26:    26  2.0  1.5  NA     TRUE       NA
    27:    27  8.5  6.0  NA    FALSE       NA
    28:    28  7.0  5.0  NA    FALSE       NA
    29:    29  6.0  5.0  NA    FALSE       NA
    30:    30  6.0  5.0  NA    FALSE       NA
    31:    31  6.0  5.0  NA    FALSE       NA
    32:    32  8.0  7.0  NA    FALSE       NA
    33:    33  5.0  4.0   1     TRUE       NA
    34:    34  5.0  4.0  NA    FALSE       NA
    35:    35  5.0  3.0  NA    FALSE       NA
    36:    36  7.0  2.0  NA    FALSE       NA
    37:    37  4.0  3.0  NA    FALSE       NA
    38:    38  4.0  3.0  NA    FALSE       NA
    39:    39  3.0  2.5  NA    FALSE       NA
    40:    40  6.0  5.0  NA    FALSE       NA
    41:    41  3.0  2.5  NA     TRUE        1
    42:    42  2.0  1.0  NA     TRUE        2
    43:    43  5.0  3.5  NA    FALSE       NA
    44:    44  3.0  2.7   1     TRUE        3
    45:    45  2.5  1.7  NA     TRUE        4
    46:    46  1.5  1.3  NA     TRUE        5
    47:    47  1.2  0.5  NA     TRUE        6
    48:    48  6.0  5.5  NA    FALSE       NA
    49:    49  5.0  4.5  NA     TRUE        7
    50:    50  4.0  3.5  NA     TRUE       NA
        order val1 val2 win cond_val cond_win
    

答案 1 :(得分:0)

# data generation
exempleData <- data.frame(order = 1:50 ,val1 = runif(50),val2 = runif(50))
exempleData$win <- NA; exempleData$win[sample(1:50,1)] <- 1
exempleData$cond_win <- NA
# select rows under conditions and assign value one to cond_win
exempleData$cond_win[((which(exempleData$win == 1)+1):length(exempleData$val1))][
  exempleData$val2[((which(exempleData$win == 1)+1):length(exempleData$val2))-1]>
    exempleData$val1[((which(exempleData$win == 1)+1):length(exempleData$val1))]
  ] <- 1
# transform 1 to count in cond_win
exempleData$cond_win[!is.na(exempleData$cond_win) ] <- cumsum(exempleData$cond_win[!is.na(exempleData$cond_win) ]) 
# Remove count greater than 7 
exempleData$cond_win[exempleData$cond_win>7] <- NA
# Here we are!
exempleData