如何基于R data.table中的第一行按组进行ifelse计算

时间:2019-07-09 11:05:17

标签: r data.table

我有以下data.table:

dt <- fread("
        PERIOD | EI_1 | EI_2 | EI_3 | EO_3 | GROUP
           0   |  1   |  1.5 | 1.75 |      |   A  
           1   |      |  1.4 |      |      |   A
           2   |      |  1.3 |      |      |   A
           3   |      |  1.2 |      |      |   A
           4   |      |  1.1 |      |      |   A
           0   |  0   |  0.5 | 0.75 |      |   B
           1   |      |  0.4 |      |      |   B
           2   |      |  0.3 |      |      |   B
           3   |      |  0.2 |      |      |   B  
           4   |      |  0.1 |      |      |   B
        ", 
        sep = "|",
        colClasses = c("EO_3" = "numeric"))

我想做一些依赖于滞后的计算,由以下函数定义:

calc_EO_3 <- function(PERIOD, EI_1, EI_2, EI_3){
  ifelse(
    PERIOD == 0,
    EI_3,
    ifelse(
      PERIOD <= 2,
      shift(EI_2, type="lag"),
      ifelse(
        EI_1[1] == 1,
        0.2 * shift(EI_2, type="lag"),
        20 * shift(EI_2, type="lag")
      )
    )
  )
}

应该返回以下DT:

dt[, EO_3 := calc_EO_3(PERIOD, EI_1, EI_2, EI_3), by = GROUP][]



 PERIOD EI_1 EI_2 EI_3 EO_3 GROUP
 1:      0    1  1.5 1.75 1.75     A
 2:      1   NA  1.4   NA 1.50     A
 3:      2   NA  1.3   NA 1.40     A
 4:      3   NA  1.2   NA 0.26     A
 5:      4   NA  1.1   NA 0.24     A
 6:      0    0  0.5 0.75 0.75     B
 7:      1   NA  0.4   NA 0.50     B
 8:      2   NA  0.3   NA 0.40     B
 9:      3   NA  0.2   NA 6.00     B
10:      4   NA  0.1   NA 4.00     B

但是,相反,我得到了以下内容:

 PERIOD EI_1 EI_2 EI_3 EO_3 GROUP
 1:      0    1  1.5 1.75 1.75     A
 2:      1   NA  1.4   NA 1.50     A
 3:      2   NA  1.3   NA 1.40     A
 4:      3   NA  1.2   NA   NA     A
 5:      4   NA  1.1   NA   NA     A
 6:      0    0  0.5 0.75 0.75     B
 7:      1   NA  0.4   NA 0.50     B
 8:      2   NA  0.3   NA 0.40     B
 9:      3   NA  0.2   NA   NA     B
10:      4   NA  0.1   NA   NA     B

问题在于该函数不仅检查EI_1[1] == 1,还使计算在由该条件过滤的子集中进行。

我该如何使该功能检查一组第一行的条件,然后根据该条件对整个组进行计算?

3 个答案:

答案 0 :(得分:2)

您可以使用rep(EI_1[1L]==1, .N)来修改代码:

calc_EO_3 <- function(PERIOD, EI_1, EI_2, EI_3){
    ifelse(
        PERIOD == 0,
        EI_3,
        ifelse(
            PERIOD <= 2,
            shift(EI_2, type="lag"),
            ifelse(
                rep(EI_1[1]==1, .N),   #this is the change
                0.2 * shift(EI_2, type="lag"),
                20 * shift(EI_2, type="lag")
            )
        )
    )
}

dt[, EO_3 := calc_EO_3(PERIOD, EI_1, EI_2, EI_3), by = GROUP][]

输出:

    PERIOD EI_1 EI_2 EI_3 EO_3 GROUP
 1:      0    1  1.5 1.75 1.75     A
 2:      1   NA  1.4   NA 1.50     A
 3:      2   NA  1.3   NA 1.40     A
 4:      3   NA  1.2   NA 0.26     A
 5:      4   NA  1.1   NA 0.24     A
 6:      0    0  0.5 0.75 0.75     B
 7:      1   NA  0.4   NA 0.50     B
 8:      2   NA  0.3   NA 0.40     B
 9:      3   NA  0.2   NA 6.00     B
10:      4   NA  0.1   NA 4.00     B

或者,

dt[, EO_3 := 20 * shift(EI_2), by=.(GROUP)][
    GROUP %in% dt[EI_1==1L, GROUP], EO_3 := 0.2 * shift(EI_2), by=.(GROUP)][
        PERIOD <= 2L, EO_3 := shift(EI_2, fill=EI_3[1L]), by=.(GROUP)]

请注意,rdatatable github存储库中正在开发fifelse

答案 1 :(得分:2)

类似于@chinsoon的“替代地...”答案:

dt[, `:=`(
  EO_3 = shift(EI_2, fill=first(EI_3)),
  mult = 2*10 ^ if (first(EI_1) == 1) -1 else 1
), by=.(GROUP)]

dt[PERIOD > 2, EO_3 := EO_3 * mult ]
dt[, mult := NULL]

    PERIOD EI_1 EI_2 EI_3 EO_3 GROUP
 1:      0    1  1.5 1.75 1.75     A
 2:      1   NA  1.4   NA 1.50     A
 3:      2   NA  1.3   NA 1.40     A
 4:      3   NA  1.2   NA 0.26     A
 5:      4   NA  1.1   NA 0.24     A
 6:      0    0  0.5 0.75 0.75     B
 7:      1   NA  0.4   NA 0.50     B
 8:      2   NA  0.3   NA 0.40     B
 9:      3   NA  0.2   NA 6.00     B
10:      4   NA  0.1   NA 4.00     B

答案 2 :(得分:0)

您可以使用基本R ifelse条件进行操作。这将为您提供所需的输出

library(dplyr)
df <-as.data.frame(dt)


df$EO_3 <- ifelse(df$PERIOD == 0, df$EI_3,ifelse(df$PERIOD <= 2 & df$PERIOD > 0 ,lag(df$EI_2,1),ifelse(df$EI_1 == 1 | df$PERIOD > 2,0.2*lag(df$EI_2,1),20*lag(df$EI_2,1))))