将 `dplyr::across` 与具有多个参数的函数一起使用

时间:2021-02-23 08:39:52

标签: r dplyr

我想知道是否有办法将 dplyr::across 与需要多个参数的函数一起使用,如果没有,如何在 dplyr/tidyverse 中完成以下操作。

library(dplyr)

# create a dataframe
df <-
  structure(list(
    x1_estimate = c(
      0.185050288587259, 0.151839113724119,
      0.134106347795535, 0.16816621423223
    ), x2_estimate = c(
      0.210983518279099,
      0.337090844267208, 0.324663150698154, 0.254871197876221
    ), x3_estimate = c(
      0.122881208643618,
      0.0707293652735489, 0.0981291893590288, -0.0214831044826657
    ),
    x1_se = c(
      0.00986950954467025, 0.00625871919316588, 0.0445182168165812,
      0.0244314083271791
    ), x2_se = c(
      0.00954593822897476, 0.00669845532512913,
      0.0478789857255503, 0.0237263111649421
    ), x3_se = c(
      0.017952784431167,
      0.0122226237123911, 0.0836135673502282, 0.041558861509543
    )
  ), row.names = c(NA, -4L), class = c("tbl_df", "tbl", "data.frame"))

对于只需要一个参数的函数

例如,假设我们只想计算方差,它只需要一个参数(标准错误)

df %>% mutate(across(contains("_se"), ~ (.^2), .names = "{.col}_var"))
#> # A tibble: 4 x 9
#>   x1_estimate x2_estimate x3_estimate   x1_se   x2_se  x3_se x1_se_var x2_se_var
#>         <dbl>       <dbl>       <dbl>   <dbl>   <dbl>  <dbl>     <dbl>     <dbl>
#> 1       0.185       0.211      0.123  0.00987 0.00955 0.0180 0.0000974 0.0000911
#> 2       0.152       0.337      0.0707 0.00626 0.00670 0.0122 0.0000392 0.0000449
#> 3       0.134       0.325      0.0981 0.0445  0.0479  0.0836 0.00198   0.00229  
#> 4       0.168       0.255     -0.0215 0.0244  0.0237  0.0416 0.000597  0.000563 
#> # … with 1 more variable: x3_se_var <dbl>

对于需要多个参数的函数

现在假设我们要计算置信区间,这需要估计计算的标准误差。

x1_conf.low = x1_estimate - 1.96 * x1_se
x2_conf.low = x2_estimate - 1.96 * x2_se
x3_conf.low = x3_estimate - 1.96 * x3_se

知道这行不通,但这只是为了说明目的:

df %>%
  mutate(
    across(matches("_se|_estimate"),
      ~ (contains("_estimate") - 1.96 * contains("_se")),
      .names = "{.col}_conf.low"
    )
  )
#> Error: Problem with `mutate()` input `..1`.
#> x `contains()` must be used within a *selecting* function.
#> ℹ See <https://tidyselect.r-lib.org/reference/faq-selection-context.html>.
#> ℹ Input `..1` is `(function (.cols = everything(), .fns = NULL, ..., .names = NULL) ...`.

如果没有,我也很乐意看到使用 dplyr/tidyverse 实现相同的另一个解决方案。

2 个答案:

答案 0 :(得分:2)

您可以将 across 用作:

library(dplyr)
df %>%
  mutate(across(contains("_estimate"), .names = "{.col}_conf.low") - 
          1.96 * across(contains("_se")))

在基础 R 中,你可以这样做:

estimate_cols <- grep('estimate', names(df), value = TRUE)
se_cols <- grep('se', names(df), value = TRUE)

df[paste0(estimate_cols, '_conf.low')] <- df[estimate_cols] - 1.96 * df[se_cols]

答案 1 :(得分:1)

我们也可以使用单个 across

library(dplyr)
df %>% 
     mutate(across(ends_with('_estimate'), ~ .  - 
          1.96 * get(str_replace(cur_column(), 'estimate', 'se')), .names = '{.col}_conf.low'))
# A tibble: 4 x 9
#  x1_estimate x2_estimate x3_estimate   x1_se   x2_se  x3_se x1_estimate_conf.low x2_estimate_conf.low x3_estimate_conf.low
#        <dbl>       <dbl>       <dbl>   <dbl>   <dbl>  <dbl>                <dbl>                <dbl>                <dbl>
#1       0.185       0.211      0.123  0.00987 0.00955 0.0180               0.166                 0.192               0.0877
#2       0.152       0.337      0.0707 0.00626 0.00670 0.0122               0.140                 0.324               0.0468
#3       0.134       0.325      0.0981 0.0445  0.0479  0.0836               0.0469                0.231              -0.0658
#4       0.168       0.255     -0.0215 0.0244  0.0237  0.0416               0.120                 0.208              -0.103