Question

描述

ifelse（）函数允许通过一系列测试过滤向量中的值，每个测试都会产生不同的操作，以防结果为正。例如，让xx为data.frame，如下所示：

xx <- data.frame(a=c(1,2,1,3), b=1:4)
xx

a b
  1 1
  2 2
  1 3
  3 4

假设您要从 b 列创建新列 c ，但取决于 a 列中的值以下方式：

对于每一行，

如果 a 列中的值为1，则列 c 中的值与 b 列中的值相同。
如果 a 列中的值为2，则列 c 中的值是 b 列中值的100倍。
在任何其他情况下， c 列中的值是 b 列中值的负数。

使用 ifelse（），解决方案可能是：

xx$c <- ifelse(xx$a==1, xx$b, 
               ifelse(xx$a==2, xx$b*100,
                      -xx$b))
xx

a b c
  1 1 1
  2 2 200
  1 3 3
  3 4 -4

问题1

当测试次数增加时，例如四次测试会出现美学问题：

xx$c <- ifelse(xx$a==1, xx$b, 
           ifelse(xx$a==2, xx$b*100,
                  ifelse(xx$a==3, ...,
                         ifelse(xx$a==4, ...,
                                ...))))

我在this page找到了问题的部分解决方案，其中包括函数 if.else_（），i_（），e _（）的定义，如下所示： / p>

library(lazyeval)
i_ <- function(if_stat, then) {
    if_stat <- lazyeval::expr_text(if_stat)
    then    <- lazyeval::expr_text(then)
    sprintf("ifelse(%s, %s, ", if_stat, then)
}

e_ <- function(else_ret) {
    else_ret <- lazyeval::expr_text(else_ret)
    else_ret
}

if.else_ <- function(...) {
    args <- list(...)

    for (i in 1:(length(args) - 1) ) {
        if (substr(args[[i]], 1, 6) != "ifelse") {
            stop("All but the last argument, need to be if.then_ functions.", call. = FALSE)
        }
    }
    if (substr(args[[length(args)]], 1, 6) == "ifelse"){
        stop("Last argument needs to be an else_ function.", call. = FALSE)
    }
    args$final <- paste(rep(')', length(args) - 1), collapse = '')
    eval_string <- do.call('paste', args)
    eval(parse(text = eval_string))
}

通过这种方式，描述中给出的问题可以重写如下：

xx <- data.frame(a=c(1,2,1,3), b=1:4)
xx$c <- if.else_(
    i_(xx$a==1, xx$b),
    i_(xx$a==2, xx$b*100),
    e_(-xx$b)
) 
xx

a b c
  1 1 1
  2 2 200
  1 3 3
  3 4 -4

四个测试的代码只是：

xx$c <- if.else_(
    i_(xx$a==1, xx$b),
    i_(xx$a==2, xx$b*100),
    i_(xx$a==3, ...), # dots meaning actions for xx$a==3
    i_(xx$a==4, ...), # dots meaning actions for xx$a==4
    e_(...)           # dots meaning actions for any other case
)

问题2＆amp;问题

给定的代码显然解决了这个问题。然后，我编写了以下测试函数：

test.ie <- function() {
    dd <- data.frame(a=c(1,2,1,3), b=1:4)
    if.else_(
        i_(dd$a==1, dd$b),
        i_(dd$a==2, dd$b*100),
        e_(-dd$b)
    ) # it should give c(1, 200, 3, -4)
}

当我尝试测试时：

 test.ie()

它吐出以下错误消息：

ifelse错误（dd $ a == 1，dd $ b，ifelse（dd $ a == 2，dd $ b * 100，-dd $ b））：
对象＆＃39; dd＆＃39;找不到

问题

由于 if.else _（）语法构造函数不应仅从控制台运行，因此有一种方法可以让它知道＆＃39;来自调用它的函数的变量？

注意

在＆＃34; Best way to replace a lengthy ifelse structure in R＆＃34;中，发布了类似的问题。但是，给定的解决方案侧重于使用给定的常量输出值（＆＃34;然后＆＃34;或＆＃34;其他＆＃34;插槽）构建表格的新列ifelse（）函数），而我的案例解决了一个句法问题，其中＆＃34;然后＆＃34;或＆＃34;否则＆＃34; slot甚至可以是其他data.frame元素或变量的表达式。

Answer 1

我认为你可以在dplyr::case_when内使用dplyr::mutate来实现这一目标。

library(dplyr)

df <- tibble(a=c(1,2,1,3), b=1:4)

df %>% 
  mutate(
    foo = case_when(
      .$a == 1 ~ .$b,
      .$a == 2 ~ .$b * 100L,
      TRUE   ~ .$b * -1L
    )
  )

#> # A tibble: 4 x 3
#>       a     b   foo
#>   <dbl> <int> <int>
#> 1     1     1     1
#> 2     2     2   200
#> 3     1     3     3
#> 4     3     4    -4

在即将到来的dplyr 0.6.0中，你不需要使用.$的笨拙工作，你可以使用：

df %>% 
  mutate(
    foo = case_when(
      a == 1 ~ b,
      a == 2 ~ b * 100L,
      TRUE   ~ b * -1L
    )
  )

Answer 2

考虑到MrFlick's建议，我重新编码了 if.else _（）函数，如下所示：

if.else_ <- function(...) {
    args <- list(...)

    for (i in 1:(length(args) - 1) ) {
        if (substr(args[[i]], 1, 6) != "ifelse") {
            stop("All but the last argument, need to be if.then_ functions.", call. = FALSE)
        }
    }
    if (substr(args[[length(args)]], 1, 6) == "ifelse"){
        stop("Last argument needs to be an else_ function.", call. = FALSE)
    }
    args$final <- paste(rep(')', length(args) - 1), collapse = '')
    eval_string <- do.call('paste', args)
    eval(parse(text = eval_string), envir = parent.frame())
}

现在 test.ie（）功能正常运行

test.ie()

[1] 1 200 3 -4

Answer 3

在充分尊重OP改进嵌套ifelse()的卓越努力的情况下，我更喜欢采用不同的方法，我认为这种方法易于编写，简洁，可维护且快速：

xx <- data.frame(a=c(1L,2L,1L,3L), b=1:4)

library(data.table)
# coerce to data.table, and set the default first
setDT(xx)[, c:= -b]
xx[a == 1L, c := b]        # 1st special case
xx[a == 2L, c := 100L*b]   # 2nd special case, note use of integer 100L
# xx[a == 3L, c := ...]    # other cases
# xx[a == 4L, c := ...]
#...

xx
#   a b   c
#1: 1 1   1
#2: 2 2 200
#3: 1 3   3
#4: 3 4  -4

请注意，对于第二个特殊情况b乘以整数常量100L，以确保右侧都是整数类型，以避免类型转换为double。

编辑2：这也可以作为单行以更简洁（但仍可维护）的方式编写：

setDT(xx)[, c:= -b][a == 1L, c := b][a == 2L, c := 100*b][]

data.table链接在这里工作，因为c已更新到位，以便后续表达式作用于{em>所有行{{1}即使前一个表达式是对行子集的选择性更新。

编辑1：此方法也可以使用基本R实现：

xx

嵌套ifelse：改进的语法

描述

问题1

问题2＆amp;问题

问题

注意

3 个答案: