Question

假设我的数据框有A，B，C，D，E列。

我想生成一个具有A，B，C，X列的数据帧，其中X = D * E。

很明显，我可以使用%>% mutate(X = D * E) %>% select (-D, -E)，但是对于更复杂的情况，是否可以通过一个命令来完成？像transmute()一样，但只丢弃提到的列。

傻，但是我一直希望这样的简洁。

Answer 1

如果要合并这两个操作，可以在NULL中使用mutate来指定应删除的列：

df %>% mutate( X=D*E, D=NULL, E=NULL )

不幸的是，您仍然必须两次提及每个变量，所以也许它只是稍微简洁一点。

更新：我真的很喜欢这个问题，因为它本质上要求的是同时具有mutate和transmute某些特征的增变器。这样的变异器将需要解析提供的表达式，以识别计算中正在使用的符号，然后从结果中删除这些符号。

要实现这样的增幅器，我们将需要一些工具。首先，让我们定义一个函数，该函数检索表达式的abstract syntax tree (AST)。

library( tidyverse )

## Recursively constructs the abstract syntax tree (AST) of the provided expression
getAST <- function( ee ) { as.list(ee) %>% map_if(is.call, getAST) }

下面是一个getAST的示例：

z <- quote( a*log10(x)+b )   ## Captures the expression a*log10(x)+b
getAST( z ) %>% str
# List of 3
#  $ : symbol +
#  $ :List of 3
#   ..$ : symbol *
#   ..$ : symbol a
#   ..$ :List of 2
#   .. ..$ : symbol log10
#   .. ..$ : symbol x
#  $ : symbol b

检索表达式使用的符号列表只需要展平和分解这棵树即可。

## Retrieves all symbols (as strings) used in a given expression
getSyms <- function( ee ) { getAST(ee) %>% unlist %>% map_chr(deparse) }
getSyms(z)
# [1] "+"     "*"     "a"     "log10" "x"     "b"

我们现在准备实现我们的新mutator，该mutator计算新列（类似于mutate）并删除计算中使用的变量（类似于transmute）：

## A new mutator that removes all variables used by the computations
transmutate <- function( .data, ... )
{
    ## Capture the provided expressions and retrieve their symbols
    vSyms <- enquos(...) %>% map( ~getSyms(get_expr(.x)) )

    ## Identify symbols that are in common with the provided dataset
    ## These columns are to be removed
    vToRemove <- intersect( colnames(.data), unlist(vSyms) )

    ## Pass on the expressions to mutate to do the work
    ## Remove the identified columns from the result
    mutate( .data, ... ) %>% select( -one_of(vToRemove) )
}

让我们试用一下新功能：

## Expected output should include new columns X, Y
##    removed columns vs, drat, wt, mpg, and cyl
##    and everything else the same
## (Note that in the classical tidyverse spirit, rownames are not preserved)
transmutate( mtcars, X = ifelse( vs, drat, wt ), Y = mpg*cyl )
#     disp  hp  qsec am gear carb     X     Y
# 1  160.0 110 16.46  1    4    4 2.620 126.0
# 2  160.0 110 17.02  1    4    4 2.875 126.0
# 3  108.0  93 18.61  1    4    1 3.850  91.2
# 4  258.0 110 19.44  0    3    1 3.080 128.4
# ...

Answer 2

我们需要在.pb中指定感兴趣的列，因为它将仅返回传递到其中的那些列。

transmute

如果有很多列，则不将其一一键入的一种选择是将其转换为符号，然后进行求值（df %>% transmute(A, B, C, X = D*E)）

!!!

或者如果我们不知道感兴趣的列的索引，而只知道要删除的列的名称

df %>% 
  transmute(!!! rlang::syms(names(.)[1:3]), X = D*E)

数据

df %>% 
    transmute(!!! rlang::syms(setdiff(names(.), c('D', 'E'))), X = D*E)

Answer 3

现在添加了一种用于诱变的实验方法，该方法可让您一次完成操作：

df %>% mutate(X = D * E, .keep = "unused")

还可以指定新变量在其他变量之间的位置。参见https://rdrr.io/github/tidyverse/dplyr/man/mutate.html

dplyr mutate / transmute：仅删除公式中使用的列

3 个答案:

数据