规则表避免嵌套ifelse语句

时间:2019-12-27 10:04:49

标签: r

想法是一种易于管理的方法,用于从某些表定义规则:

library(data.table)

a <- data.table(rule = c("rule1", "rule2", "rule3"),
                bool = c(T,T,F))

a
#     rule  bool
# 1: rule1  TRUE
# 2: rule2  TRUE
# 3: rule3 FALSE

ifelse(a[rule == "rule1", bool] & a[rule == "rule2", bool] & a[rule == "rule3", bool], 1,
       ifelse(a[rule == "rule1", bool] & a[rule == "rule2", bool], 2,
              ifelse(a[rule == "rule2", bool] & a[rule == "rule3", bool], 3, 4)))
# [1] 2

很明显,随着我不断添加规则,这不是很可持续或难以理解。 这里ifelse的替代品是什么?

2 个答案:

答案 0 :(得分:2)

这是一种方法,基于@tmfmnk的评论:

library(data.table)
library(dplyr)

a <- data.table(rule = c("rule1", "rule2", "rule3"),
                bool = c(F,T,T))

case_when(a[(bool), identical(rule, c("rule1", "rule2", "rule3"))] ~ 1,
          a[(bool), identical(rule, c("rule1", "rule2"))] ~ 2,
          a[(bool), identical(rule, c("rule2", "rule3"))] ~ 3,
          TRUE ~ 4)

答案 1 :(得分:2)

这是一个非常有趣的问题,特别是因为条件并不总是涉及a的所有行,即rule1rule2rule3

我试图找到一种通用解决方案,该解决方案可以扩展为任意数量的条件以及a中的其他行。

主要思想是用ifelse()替换嵌套case_when()data.table语句中的条件,然后可以将它们与a结合使用:

library(data.table)
b <- fread(
"rule1, rule2, rule3, result
TRUE,  TRUE,  TRUE,  1
TRUE,  TRUE,  NA,    2
NA,    TRUE,  TRUE,  3
NA,    NA,    NA,    4"
)

例如,如果2rule1均为rule2,而TRUE的值无关紧要,则第2行中的条件指定返回rule3可以作为通配符忽略。

重要的是要注意条件的顺序很重要:首先,必须检查没有任何通配符的条件。然后,使用一个通配符的条件,依此类推。最后,如果未找到其他匹配项,则将应用默认值(所有通配符)。必须始终在最后一行中提供默认值。
因此,最专门的条件排在第一位,最一般的条件排在最后。

OP已经以长格式提供了测试数据a

    rule  bool
1: rule1  TRUE
2: rule2  TRUE
3: rule3 FALSE

因此,条件b也被重整为长格式:

lb <- melt(b[, id := .I], c("id", "result"), variable.name = "rule", value.name = "bool", na.rm = TRUE)[
  , nr := .N, by = id][]

lb
   id result  rule bool nr
1:  1      1 rule1 TRUE  3
2:  2      2 rule1 TRUE  2
3:  1      1 rule2 TRUE  3
4:  2      2 rule2 TRUE  2
5:  3      3 rule2 TRUE  2
6:  1      1 rule3 TRUE  3
7:  3      3 rule3 TRUE  2

在重塑之前,已添加行id,该行指示条件的顺序。长格式省略了通配符,因为联接不需要它们。重塑后,每nr的剩余行数id将被追加,即非通配符条目的数量。

现在,条件已经过测试:

answer <- lb[a, on = .(rule, bool), nomatch = 0L][
  , result[nr == .N], by = .(nr, id)][
    order(-nr, id), first(V1)]
if (length(answer) == 0L) answer <- b[id == max(id), result] # default
answer

这分为四个步骤:

    alb上将
  1. rulebool连接(内部连接)
  2. 由于要连接的数据采用长格式,因此通过检查每个id的条件数来消除不完整的条件(nr子句中包含by =只是为了方便起见,下一步是必需的),
  3. 其余行被排序为从最特殊的条件中选择第一个result
  4. 如果上述操作未返回answer,则返回默认值。

对于给定的a,以上代码返回

answer
[1] 2

更多测试用例

要验证上述代码是否正常运行,需要进行更彻底的测试

test <- CJ(rule1 = c(TRUE, FALSE), rule2 = c(TRUE, FALSE), rule3 = c(TRUE, FALSE), sorted = FALSE)
test
   rule1 rule2 rule3
1:  TRUE  TRUE  TRUE
2:  TRUE  TRUE FALSE
3:  TRUE FALSE  TRUE
4:  TRUE FALSE FALSE
5: FALSE  TRUE  TRUE
6: FALSE  TRUE FALSE
7: FALSE FALSE  TRUE
8: FALSE FALSE FALSE

每行代表a的一个版本,该版本通过以下方式转换为OP的长格式:

a <- melt(test[i], measure.vars = patterns("^rule"), variable.name = "rule", value.name = "bool")

通过遍历i,可以测试TRUE / FALSE值的所有可能组合。此外,还会打印一些中间结果,以帮助您了解其工作原理:

library(magrittr) # piping used here to improve readability
test <- CJ(rule1 = c(TRUE, FALSE), rule2 = c(TRUE, FALSE), rule3 = c(TRUE, FALSE), sorted = FALSE)
for (i in seq(nrow(test))) {
  cat("test case", i, "\n")
  a <- melt(test[i], measure.vars = patterns("^rule"), variable.name = "rule", value.name = "bool") %T>% 
    print()
  lb[a, on = .(rule, bool), nomatch = 0L][, result[nr == .N], keyby = .(nr, id)] %>% 
    unique() %>%
    print() # intermediate result printed for illustration
  answer <- lb[a, on = .(rule, bool), nomatch = 0L][
    , result[nr == .N], by = .(nr, id)][
      order(-nr, id), first(V1)]
  if (length(answer) == 0L) answer <- b[id == max(id), result] # default from b
  cat("answer = ", answer, "\n\n")
}
test case 1 
    rule bool
1: rule1 TRUE
2: rule2 TRUE
3: rule3 TRUE
   nr id V1
1:  2  2  2
2:  2  3  3
3:  3  1  1
answer =  1 

test case 2 
    rule  bool
1: rule1  TRUE
2: rule2  TRUE
3: rule3 FALSE
   nr id V1
1:  2  2  2
answer =  2 

test case 3 
    rule  bool
1: rule1  TRUE
2: rule2 FALSE
3: rule3  TRUE
Empty data.table (0 rows and 3 cols): nr,id,V1
answer =  4 

test case 4 
    rule  bool
1: rule1  TRUE
2: rule2 FALSE
3: rule3 FALSE
Empty data.table (0 rows and 3 cols): nr,id,V1
answer =  4 

test case 5 
    rule  bool
1: rule1 FALSE
2: rule2  TRUE
3: rule3  TRUE
   nr id V1
1:  2  3  3
answer =  3 

test case 6 
    rule  bool
1: rule1 FALSE
2: rule2  TRUE
3: rule3 FALSE
Empty data.table (0 rows and 3 cols): nr,id,V1
answer =  4 

test case 7 
    rule  bool
1: rule1 FALSE
2: rule2 FALSE
3: rule3  TRUE
Empty data.table (0 rows and 3 cols): nr,id,V1
answer =  4 

test case 8 
    rule  bool
1: rule1 FALSE
2: rule2 FALSE
3: rule3 FALSE
Empty data.table (0 rows and 3 cols): nr,id,V1
answer =  4

从答案中可以看出,既定条件都得到满足。

测试案例1值得仔细研究。在这里,条件id 1、2和3可能适用,但条件1优先于其他条件,因为它是最专门的。

扩展

这表明该解决方案可以扩展为a中的更多规则以及b中的更多条件。

这是一个包含7个条件和4个规则列的示例。

b4 <- fread(
  "rule1, rule2, rule3, rule4, result
TRUE,  TRUE,  TRUE,  TRUE,  1
TRUE,  TRUE,  NA,    NA,    2
NA,    TRUE,  TRUE,  NA,    3
NA,    FALSE, NA,    NA,    5
TRUE,  FALSE, NA,    NA,    6
FALSE, FALSE, NA, FALSE,    7
NA,    NA,    NA,    NA,    4"
)

测试代码已简化,可以更紧凑地查看16个文本案例:

lb <- melt(b4[, id := .I], c("id", "result"), variable.name = "rule", value.name = "bool", na.rm = TRUE)[, nr := .N, by = id][]
test <- CJ(rule1 = c(TRUE, FALSE), rule2 = c(TRUE, FALSE), rule3 = c(TRUE, FALSE), rule4 = c(TRUE, FALSE), sorted = FALSE)
sapply(
  seq(nrow(test)), 
  function(i) {
    a <- melt(test[i], measure.vars = patterns("^rule"), variable.name = "rule", value.name = "bool")
    answer <- lb[a, on = .(rule, bool), nomatch = 0L][, result[nr == .N], by = .(nr, id)][order(-nr, id), first(V1)]
    if (length(answer) == 0L) answer <- b4[id == max(id), result] # default from b
    return(answer)
  }
) %>% 
  cbind(test, .) %>% 
  setnames(".", "result") %>% 
  print()

它返回测试用例表,即a的不同情况,以宽格式显示,并附加结果:

    rule1 rule2 rule3 rule4 result
 1:  TRUE  TRUE  TRUE  TRUE      1
 2:  TRUE  TRUE  TRUE FALSE      2
 3:  TRUE  TRUE FALSE  TRUE      2
 4:  TRUE  TRUE FALSE FALSE      2
 5:  TRUE FALSE  TRUE  TRUE      6
 6:  TRUE FALSE  TRUE FALSE      6
 7:  TRUE FALSE FALSE  TRUE      6
 8:  TRUE FALSE FALSE FALSE      6
 9: FALSE  TRUE  TRUE  TRUE      3
10: FALSE  TRUE  TRUE FALSE      3
11: FALSE  TRUE FALSE  TRUE      4
12: FALSE  TRUE FALSE FALSE      4
13: FALSE FALSE  TRUE  TRUE      5
14: FALSE FALSE  TRUE FALSE      7
15: FALSE FALSE FALSE  TRUE      5
16: FALSE FALSE FALSE FALSE      7