我有以下data.table:
dt <- fread("
PERIOD | EI_1 | EI_2 | EI_3 | EO_3 | GROUP
0 | 1 | 1.5 | 1.75 | | A
1 | | 1.4 | | | A
2 | | 1.3 | | | A
3 | | 1.2 | | | A
4 | | 1.1 | | | A
0 | 0 | 0.5 | 0.75 | | B
1 | | 0.4 | | | B
2 | | 0.3 | | | B
3 | | 0.2 | | | B
4 | | 0.1 | | | B
",
sep = "|",
colClasses = c("EO_3" = "numeric"))
我想做一些依赖于滞后的计算,由以下函数定义:
calc_EO_3 <- function(PERIOD, EI_1, EI_2, EI_3){
ifelse(
PERIOD == 0,
EI_3,
ifelse(
PERIOD <= 2,
shift(EI_2, type="lag"),
ifelse(
EI_1[1] == 1,
0.2 * shift(EI_2, type="lag"),
20 * shift(EI_2, type="lag")
)
)
)
}
应该返回以下DT:
dt[, EO_3 := calc_EO_3(PERIOD, EI_1, EI_2, EI_3), by = GROUP][]
PERIOD EI_1 EI_2 EI_3 EO_3 GROUP
1: 0 1 1.5 1.75 1.75 A
2: 1 NA 1.4 NA 1.50 A
3: 2 NA 1.3 NA 1.40 A
4: 3 NA 1.2 NA 0.26 A
5: 4 NA 1.1 NA 0.24 A
6: 0 0 0.5 0.75 0.75 B
7: 1 NA 0.4 NA 0.50 B
8: 2 NA 0.3 NA 0.40 B
9: 3 NA 0.2 NA 6.00 B
10: 4 NA 0.1 NA 4.00 B
但是,相反,我得到了以下内容:
PERIOD EI_1 EI_2 EI_3 EO_3 GROUP
1: 0 1 1.5 1.75 1.75 A
2: 1 NA 1.4 NA 1.50 A
3: 2 NA 1.3 NA 1.40 A
4: 3 NA 1.2 NA NA A
5: 4 NA 1.1 NA NA A
6: 0 0 0.5 0.75 0.75 B
7: 1 NA 0.4 NA 0.50 B
8: 2 NA 0.3 NA 0.40 B
9: 3 NA 0.2 NA NA B
10: 4 NA 0.1 NA NA B
问题在于该函数不仅检查EI_1[1] == 1
,还使计算在由该条件过滤的子集中进行。
我该如何使该功能检查一组第一行的条件,然后根据该条件对整个组进行计算?
答案 0 :(得分:2)
您可以使用rep(EI_1[1L]==1, .N)
来修改代码:
calc_EO_3 <- function(PERIOD, EI_1, EI_2, EI_3){
ifelse(
PERIOD == 0,
EI_3,
ifelse(
PERIOD <= 2,
shift(EI_2, type="lag"),
ifelse(
rep(EI_1[1]==1, .N), #this is the change
0.2 * shift(EI_2, type="lag"),
20 * shift(EI_2, type="lag")
)
)
)
}
dt[, EO_3 := calc_EO_3(PERIOD, EI_1, EI_2, EI_3), by = GROUP][]
输出:
PERIOD EI_1 EI_2 EI_3 EO_3 GROUP
1: 0 1 1.5 1.75 1.75 A
2: 1 NA 1.4 NA 1.50 A
3: 2 NA 1.3 NA 1.40 A
4: 3 NA 1.2 NA 0.26 A
5: 4 NA 1.1 NA 0.24 A
6: 0 0 0.5 0.75 0.75 B
7: 1 NA 0.4 NA 0.50 B
8: 2 NA 0.3 NA 0.40 B
9: 3 NA 0.2 NA 6.00 B
10: 4 NA 0.1 NA 4.00 B
或者,
dt[, EO_3 := 20 * shift(EI_2), by=.(GROUP)][
GROUP %in% dt[EI_1==1L, GROUP], EO_3 := 0.2 * shift(EI_2), by=.(GROUP)][
PERIOD <= 2L, EO_3 := shift(EI_2, fill=EI_3[1L]), by=.(GROUP)]
请注意,rdatatable github存储库中正在开发fifelse
。
答案 1 :(得分:2)
类似于@chinsoon的“替代地...”答案:
dt[, `:=`(
EO_3 = shift(EI_2, fill=first(EI_3)),
mult = 2*10 ^ if (first(EI_1) == 1) -1 else 1
), by=.(GROUP)]
dt[PERIOD > 2, EO_3 := EO_3 * mult ]
dt[, mult := NULL]
PERIOD EI_1 EI_2 EI_3 EO_3 GROUP
1: 0 1 1.5 1.75 1.75 A
2: 1 NA 1.4 NA 1.50 A
3: 2 NA 1.3 NA 1.40 A
4: 3 NA 1.2 NA 0.26 A
5: 4 NA 1.1 NA 0.24 A
6: 0 0 0.5 0.75 0.75 B
7: 1 NA 0.4 NA 0.50 B
8: 2 NA 0.3 NA 0.40 B
9: 3 NA 0.2 NA 6.00 B
10: 4 NA 0.1 NA 4.00 B
答案 2 :(得分:0)
您可以使用基本R ifelse条件进行操作。这将为您提供所需的输出
library(dplyr)
df <-as.data.frame(dt)
df$EO_3 <- ifelse(df$PERIOD == 0, df$EI_3,ifelse(df$PERIOD <= 2 & df$PERIOD > 0 ,lag(df$EI_2,1),ifelse(df$EI_1 == 1 | df$PERIOD > 2,0.2*lag(df$EI_2,1),20*lag(df$EI_2,1))))