想法是一种易于管理的方法,用于从某些表定义规则:
library(data.table)
a <- data.table(rule = c("rule1", "rule2", "rule3"),
bool = c(T,T,F))
a
# rule bool
# 1: rule1 TRUE
# 2: rule2 TRUE
# 3: rule3 FALSE
ifelse(a[rule == "rule1", bool] & a[rule == "rule2", bool] & a[rule == "rule3", bool], 1,
ifelse(a[rule == "rule1", bool] & a[rule == "rule2", bool], 2,
ifelse(a[rule == "rule2", bool] & a[rule == "rule3", bool], 3, 4)))
# [1] 2
很明显,随着我不断添加规则,这不是很可持续或难以理解。
这里ifelse
的替代品是什么?
答案 0 :(得分:2)
这是一种方法,基于@tmfmnk的评论:
library(data.table)
library(dplyr)
a <- data.table(rule = c("rule1", "rule2", "rule3"),
bool = c(F,T,T))
case_when(a[(bool), identical(rule, c("rule1", "rule2", "rule3"))] ~ 1,
a[(bool), identical(rule, c("rule1", "rule2"))] ~ 2,
a[(bool), identical(rule, c("rule2", "rule3"))] ~ 3,
TRUE ~ 4)
答案 1 :(得分:2)
这是一个非常有趣的问题,特别是因为条件并不总是涉及a
的所有行,即rule1
,rule2
和rule3
。
我试图找到一种通用解决方案,该解决方案可以扩展为任意数量的条件以及a
中的其他行。
主要思想是用ifelse()
替换嵌套case_when()
或data.table
语句中的条件,然后可以将它们与a
结合使用:
library(data.table)
b <- fread(
"rule1, rule2, rule3, result
TRUE, TRUE, TRUE, 1
TRUE, TRUE, NA, 2
NA, TRUE, TRUE, 3
NA, NA, NA, 4"
)
例如,如果2
和rule1
均为rule2
,而TRUE
的值无关紧要,则第2行中的条件指定返回rule3
可以作为通配符忽略。
重要的是要注意条件的顺序很重要:首先,必须检查没有任何通配符的条件。然后,使用一个通配符的条件,依此类推。最后,如果未找到其他匹配项,则将应用默认值(所有通配符)。必须始终在最后一行中提供默认值。
因此,最专门的条件排在第一位,最一般的条件排在最后。
OP已经以长格式提供了测试数据a
:
rule bool 1: rule1 TRUE 2: rule2 TRUE 3: rule3 FALSE
因此,条件b
也被重整为长格式:
lb <- melt(b[, id := .I], c("id", "result"), variable.name = "rule", value.name = "bool", na.rm = TRUE)[
, nr := .N, by = id][]
lb
id result rule bool nr 1: 1 1 rule1 TRUE 3 2: 2 2 rule1 TRUE 2 3: 1 1 rule2 TRUE 3 4: 2 2 rule2 TRUE 2 5: 3 3 rule2 TRUE 2 6: 1 1 rule3 TRUE 3 7: 3 3 rule3 TRUE 2
在重塑之前,已添加行id
,该行指示条件的顺序。长格式省略了通配符,因为联接不需要它们。重塑后,每nr
的剩余行数id
将被追加,即非通配符条目的数量。
现在,条件已经过测试:
answer <- lb[a, on = .(rule, bool), nomatch = 0L][
, result[nr == .N], by = .(nr, id)][
order(-nr, id), first(V1)]
if (length(answer) == 0L) answer <- b[id == max(id), result] # default
answer
这分为四个步骤:
a
和lb
上将rule
和bool
连接(内部连接)id
的条件数来消除不完整的条件(nr
子句中包含by =
只是为了方便起见,下一步是必需的),result
,answer
,则返回默认值。对于给定的a
,以上代码返回
answer
[1] 2
要验证上述代码是否正常运行,需要进行更彻底的测试
test <- CJ(rule1 = c(TRUE, FALSE), rule2 = c(TRUE, FALSE), rule3 = c(TRUE, FALSE), sorted = FALSE)
test
rule1 rule2 rule3 1: TRUE TRUE TRUE 2: TRUE TRUE FALSE 3: TRUE FALSE TRUE 4: TRUE FALSE FALSE 5: FALSE TRUE TRUE 6: FALSE TRUE FALSE 7: FALSE FALSE TRUE 8: FALSE FALSE FALSE
每行代表a
的一个版本,该版本通过以下方式转换为OP的长格式:
a <- melt(test[i], measure.vars = patterns("^rule"), variable.name = "rule", value.name = "bool")
通过遍历i
,可以测试TRUE
/ FALSE
值的所有可能组合。此外,还会打印一些中间结果,以帮助您了解其工作原理:
library(magrittr) # piping used here to improve readability
test <- CJ(rule1 = c(TRUE, FALSE), rule2 = c(TRUE, FALSE), rule3 = c(TRUE, FALSE), sorted = FALSE)
for (i in seq(nrow(test))) {
cat("test case", i, "\n")
a <- melt(test[i], measure.vars = patterns("^rule"), variable.name = "rule", value.name = "bool") %T>%
print()
lb[a, on = .(rule, bool), nomatch = 0L][, result[nr == .N], keyby = .(nr, id)] %>%
unique() %>%
print() # intermediate result printed for illustration
answer <- lb[a, on = .(rule, bool), nomatch = 0L][
, result[nr == .N], by = .(nr, id)][
order(-nr, id), first(V1)]
if (length(answer) == 0L) answer <- b[id == max(id), result] # default from b
cat("answer = ", answer, "\n\n")
}
test case 1 rule bool 1: rule1 TRUE 2: rule2 TRUE 3: rule3 TRUE nr id V1 1: 2 2 2 2: 2 3 3 3: 3 1 1 answer = 1 test case 2 rule bool 1: rule1 TRUE 2: rule2 TRUE 3: rule3 FALSE nr id V1 1: 2 2 2 answer = 2 test case 3 rule bool 1: rule1 TRUE 2: rule2 FALSE 3: rule3 TRUE Empty data.table (0 rows and 3 cols): nr,id,V1 answer = 4 test case 4 rule bool 1: rule1 TRUE 2: rule2 FALSE 3: rule3 FALSE Empty data.table (0 rows and 3 cols): nr,id,V1 answer = 4 test case 5 rule bool 1: rule1 FALSE 2: rule2 TRUE 3: rule3 TRUE nr id V1 1: 2 3 3 answer = 3 test case 6 rule bool 1: rule1 FALSE 2: rule2 TRUE 3: rule3 FALSE Empty data.table (0 rows and 3 cols): nr,id,V1 answer = 4 test case 7 rule bool 1: rule1 FALSE 2: rule2 FALSE 3: rule3 TRUE Empty data.table (0 rows and 3 cols): nr,id,V1 answer = 4 test case 8 rule bool 1: rule1 FALSE 2: rule2 FALSE 3: rule3 FALSE Empty data.table (0 rows and 3 cols): nr,id,V1 answer = 4
从答案中可以看出,既定条件都得到满足。
测试案例1值得仔细研究。在这里,条件id
1、2和3可能适用,但条件1优先于其他条件,因为它是最专门的。
这表明该解决方案可以扩展为a
中的更多规则以及b
中的更多条件。
这是一个包含7个条件和4个规则列的示例。
b4 <- fread(
"rule1, rule2, rule3, rule4, result
TRUE, TRUE, TRUE, TRUE, 1
TRUE, TRUE, NA, NA, 2
NA, TRUE, TRUE, NA, 3
NA, FALSE, NA, NA, 5
TRUE, FALSE, NA, NA, 6
FALSE, FALSE, NA, FALSE, 7
NA, NA, NA, NA, 4"
)
测试代码已简化,可以更紧凑地查看16个文本案例:
lb <- melt(b4[, id := .I], c("id", "result"), variable.name = "rule", value.name = "bool", na.rm = TRUE)[, nr := .N, by = id][]
test <- CJ(rule1 = c(TRUE, FALSE), rule2 = c(TRUE, FALSE), rule3 = c(TRUE, FALSE), rule4 = c(TRUE, FALSE), sorted = FALSE)
sapply(
seq(nrow(test)),
function(i) {
a <- melt(test[i], measure.vars = patterns("^rule"), variable.name = "rule", value.name = "bool")
answer <- lb[a, on = .(rule, bool), nomatch = 0L][, result[nr == .N], by = .(nr, id)][order(-nr, id), first(V1)]
if (length(answer) == 0L) answer <- b4[id == max(id), result] # default from b
return(answer)
}
) %>%
cbind(test, .) %>%
setnames(".", "result") %>%
print()
它返回测试用例表,即a
的不同情况,以宽格式显示,并附加结果:
rule1 rule2 rule3 rule4 result 1: TRUE TRUE TRUE TRUE 1 2: TRUE TRUE TRUE FALSE 2 3: TRUE TRUE FALSE TRUE 2 4: TRUE TRUE FALSE FALSE 2 5: TRUE FALSE TRUE TRUE 6 6: TRUE FALSE TRUE FALSE 6 7: TRUE FALSE FALSE TRUE 6 8: TRUE FALSE FALSE FALSE 6 9: FALSE TRUE TRUE TRUE 3 10: FALSE TRUE TRUE FALSE 3 11: FALSE TRUE FALSE TRUE 4 12: FALSE TRUE FALSE FALSE 4 13: FALSE FALSE TRUE TRUE 5 14: FALSE FALSE TRUE FALSE 7 15: FALSE FALSE FALSE TRUE 5 16: FALSE FALSE FALSE FALSE 7