R,dplyr:按名称循环遍历列的子集并应用mutate?

时间:2017-12-04 17:18:50

标签: r dplyr

我目前的设置使用R 3.4.2和tidyverse 1.1.1。

我的目标是以this answer的方式转换数据,但是以可扩展的方式进行转换,这样我就可以轻松地更改我希望执行此操作的变量集。

为了明确起见,让我们采取以下数据:

library(tidyverse)

df = tibble(
  id = seq(1,8),
  hair.colour = c("red", "blonde", "brown", "black", "red", "blonde", "brown", "black"),
  eye.colour = c("blue", "brown", "blue", "brown", "blue", "brown", "blue", "brown"),
  gender = c("male", "male", "male", "male", "female", "female", "female",
             "female"))

这样的代码可以按照需要运行:

df2 = df %>%
  mutate(value = 1,
         hair.colour = paste("hair.colour", hair.colour, sep = ".")) %>%
  spread(hair.colour, value, fill = 0)

天真地尝试将其包裹在一个循环中,例如

factors = c("hair.colour", "eye.colour", "gender")
for (factor in factors) {
    df = df %>%
        mutate(value = 1, factor = paste(toString(factor), factor, sep = ".")) %>%
        spread(factor, value, fill = 0)
}

不起作用。我想有一个聪明的方法使用quo(),!!等,但我是R的新手,我的搜索没有产生任何我可以使用的。

有没有人在tidyverse中有任何建议(特别是如果它找到了一种方法来使用与第二个块中相同的代码)并且在它之外?

2 个答案:

答案 0 :(得分:0)

你可以这样做:

factors = c("hair.colour", "eye.colour", "gender")
for (factor in factors) {
  df = df %>%
    mutate(value = 1, x = paste(factor,.[[factor]], sep = ".")) %>%
    select_(paste0("-",factor)) %>%
    spread(x, "value", fill = 0)
}

.是使用管道时左侧的快捷方式,因此在键入.[[factor]]时,我可以写df[[factor]]一样,所以我粘贴了值您的因子字符串与相关列的值。

select_select的变体,使用标准评估(基本上你给它提供字符串),dplyr和tidyr函数通常有一个。更多:?select_

结果:

# # A tibble: 8 x 9
#      id hair.colour.black hair.colour.blonde hair.colour.brown hair.colour.red eye.colour.blue eye.colour.brown gender.female gender.male
# * <int>             <dbl>              <dbl>             <dbl>           <dbl>           <dbl>            <dbl>         <dbl>       <dbl>
# 1     1                 0                  0                 0               1               1                0             0           1
# 2     2                 0                  1                 0               0               0                1             0           1
# 3     3                 0                  0                 1               0               1                0             0           1
# 4     4                 1                  0                 0               0               0                1             0           1
# 5     5                 0                  0                 0               1               1                0             1           0
# 6     6                 0                  1                 0               0               0                1             1           0
# 7     7                 0                  0                 1               0               1                0             1           0
# 8     8                 1                  0                 0               0               0                1             1           0

答案 1 :(得分:0)

正如@aosmith指出的那样,select_已被弃用,您可能想要一个更灵活的解决方案,您可以尝试

df %>% 
  # make data long
  gather(key = key, value = value, -id) %>% 
  # unite columns
  unite(col = new_key, key, value, sep = ".") %>% 
  # add column with 1 for indication when back to wide
  mutate(new_value = 1,
         # this is only needed if you want to keep the order of the variables:
         new_key = factor(new_key, levels = unique(new_key))) %>% 
  # transform back to wide, fill NAs with 0
  spread(key = new_key, value = new_value, fill = 0)