dplyr使用case_when突变新的动态变量

时间:2018-08-02 02:02:42

标签: r dynamic dplyr mutate

我知道类似的问题herehere,但是我无法为自己的具体情况找到正确的解决方案。我发现的一些解决方案使用mutate_等,但据我了解,这些解决方案现在已过时。我是dplyr的动态用法的新手。

我有一个数据框,其中包含一些带有两个不同前缀(阿尔法和贝塔)的变量:

df <- data.frame(alpha.num = c(1, 3, 5, 7),
             alpha.char = c("a", "c", "e", "g"),
             beta.num = c(2, 4, 6, 8),
             beta.char = c("b", "d", "f", "h"),
             which.to.use = c("alpha", "alpha", "beta", "beta"))

我想创建带有前缀“选择”的新变量。它们是“ alpha”或“ beta”列的副本,具体取决于“ that.to.use”列中该行的命名方式。所需的输出将是:

desired.df <- data.frame(alpha.num = c(1, 3, 5, 7),
                     alpha.char = c("a", "c", "e", "g"),
                     beta.num = c(2, 4, 6, 8),
                     beta.char = c("b", "d", "f", "h"),
                     which.to.use = c("alpha", "alpha", "beta", "beta"),
                     chosen.num = c(1, 3, 6, 8),
                     chosen.char = c("a", "c", "f", "h"))

我的尝试失败:

varnames <- c("num", "char")
df %<>%
  mutate(as.name(paste0("chosen.", varnames)) := case_when(
    which.to.use == "alpha" ~ paste0("alpha.", varnames),
    which.to.use == "beta" ~ pasteo("beta.", varnames)
  ))

我更喜欢纯dplyr解决方案,甚至更好的是可以包含在更长的修改df的管道中(即无需停止创建“变量名”)。感谢您的帮助。

4 个答案:

答案 0 :(得分:5)

使用一些有趣的rlang东西和purrr

library(rlang)
library(purrr)
library(dplyr)

df <- data.frame(alpha.num = c(1, 3, 5, 7),
                 alpha.char = c("a", "c", "e", "g"),
                 beta.num = c(2, 4, 6, 8),
                 beta.char = c("b", "d", "f", "h"),
                 which.to.use = c("alpha", "alpha", "beta", "beta"),
                 stringsAsFactors = F)

c("num", "char") %>% 
    map(~ mutate(df, !!sym(paste0("chosen.", .x)) := 
      case_when(
          which.to.use == "alpha" ~ !!sym(paste0("alpha.", .x)),
          which.to.use == "beta" ~ !!sym(paste0("beta.", .x))
                ))) %>% 
    reduce(full_join)

结果:

  alpha.num alpha.char beta.num beta.char which.to.use chosen.num chosen.char
1         1          a        2         b        alpha          1           a
2         3          c        4         d        alpha          3           c
3         5          e        6         f         beta          6           f
4         7          g        8         h         beta          8           h

没有reduce(full_join)

c("num", "char") %>% 
  map_dfc(~ mutate(df, !!sym(paste0("chosen.", .x)) := 
                 case_when(
                   which.to.use == "alpha" ~ !!sym(paste0("alpha.", .x)),
                   which.to.use == "beta" ~ !!sym(paste0("beta.", .x))
                 ))) %>% 
  select(-ends_with("1"))



alpha.num alpha.char beta.num beta.char which.to.use chosen.num chosen.char
1         1          a        2         b        alpha          1           a
2         3          c        4         d        alpha          3           c
3         5          e        6         f         beta          6           f
4         7          g        8         h         beta          8           h

说明:
(注意:我完全或什至种类都没有rlang。也许其他人可以给出更好的解释;)。)

当我们需要paste0的裸名才能知道它是在指变量名时,单独使用mutate会产生一个字符串。

如果将paste0包裹在sym中,它将得出一个裸名:

> x <- varrnames[1]
> sym(paste0("alpha.", x))
  alpha.num

但是mutate不知道要进行评估,而是将其作为符号来读取:

> typeof(sym(paste0("alpha.", x)))
[1] "symbol"

!!”运算符将评估sym函数。比较:

> expr(mutate(df, var = sym(paste0("alpha.", x))))
mutate(df, var = sym(paste0("alpha.", x)))

> expr(mutate(df, var = !!sym(paste0("alpha.", x))))
mutate(df, var = alpha.num)

因此,使用!!sym,我们可以使用dplyr使用粘贴动态调用变量名。

答案 1 :(得分:2)

使用applymargin = 1的基本R方法,其中我们根据which.to.use列中的值为每一行选择列,并从该行的相应列中获取值。

df[c("chosen.num", "chosen.char")] <- 
          t(apply(df, 1, function(x) x[grepl(x["which.to.use"], names(df))]))

df
#  alpha.num alpha.char beta.num beta.char which.to.use chosen.num chosen.char
#1         1          a        2         b        alpha          1           a
#2         3          c        4         d        alpha          3           c
#3         5          e        6         f         beta          6           f
#4         7          g        8         h         beta          8           h

答案 2 :(得分:2)

这是一种nest()/map()策略,应该很快。它停留在tidyverse,但不进入rlang土地。

library(tidyverse)

df %>% 
    nest(-which.to.use) %>%
    mutate(new_data = map2(data, which.to.use,
                       ~ select(..1, matches(..2)) %>%
                           rename_all(funs(gsub(".*\\.", "choosen.", .) )))) %>%
    unnest()

  which.to.use alpha.num alpha.char beta.num beta.char choosen.num choosen.char
1        alpha         1          a        2         b           1            a
2        alpha         3          c        4         d           3            c
3         beta         5          e        6         f           6            f
4         beta         7          g        8         h           8            h

它捕获所有列,而不仅仅是numchar,而不是which.to.use。但这似乎是您(我)想要的IRL。如果您只想提取特定的变量,则可以在调用select(matches('(var1|var2|etc'))之前添加nest()行。

编辑: 我最初的建议是使用select()删除不需要的列,将导致进行join以便稍后再将它们带回。相反,如果您调整nest参数,则只能在某些列上实现。

我在此处添加了新的bool列,但在“选择”选项中将忽略它们:

new_df <- data.frame(alpha.num = c(1, 3, 5, 7),
                 alpha.char = c("a", "c", "e", "g"),
                 alpha.bool = FALSE,
                 beta.num = c(2, 4, 6, 8),
                 beta.char = c("b", "d", "f", "h"),
                 beta.bool = TRUE,
                 which.to.use = c("alpha", "alpha", "beta", "beta"),
                 stringsAsFactors = FALSE)

new_df %>% 
    nest(matches("num|char")) %>% # only columns that match this pattern get nested, allows you to save others
    mutate(new_data = map2(data, which.to.use,
                           ~ select(..1, matches(..2)) %>%
                               rename_all(funs(gsub(".*\\.", "choosen.", .) )))) %>%
    unnest()

  alpha.bool beta.bool which.to.use alpha.num alpha.char beta.num beta.char choosen.num choosen.char
1      FALSE      TRUE        alpha         1          a        2         b           1            a
2      FALSE      TRUE        alpha         3          c        4         d           3            c
3      FALSE      TRUE         beta         5          e        6         f           6            f
4      FALSE      TRUE         beta         7          g        8         h           8            h

答案 3 :(得分:0)

您也可以尝试 $mail = new Mail(); $mail->protocol = $this->config->get('config_mail_protocol'); $mail->parameter = $this->config->get('config_mail_parameter'); $mail->smtp_hostname = $this->config->get('config_mail_smtp_hostname'); $mail->smtp_username = $this->config->get('config_mail_smtp_username'); $mail->smtp_password = html_entity_decode($this->config->get('config_mail_smtp_password'), ENT_QUOTES, 'UTF-8'); $mail->smtp_port = $this->config->get('config_mail_smtp_port'); $mail->smtp_timeout = $this->config->get('config_mail_smtp_timeout'); $mail->setTo($this->config->get('config_email')); $mail->setFrom($this->config->get('config_email')); $mail->setReplyTo('example@domain.com'); $mail->setSender(html_entity_decode($this->config->get('config_name'), ENT_QUOTES, 'UTF-8')); $mail->setSubject(html_entity_decode($this->language->get('text_new_customer'), ENT_QUOTES, 'UTF-8') . '-' . date('d-m-y')); //$mail->sethtml($message); $mail->setText($message_admin); $mail->send(); / gather方法

spread