Question

我对tidyverse / dplyr中的突变如何工作感到困惑。我在这里提供了一个可复制的示例。一个使用mutate，另一个使用循环。我希望两者都能给出相同的结果，但事实并非如此。我不知道为什么。任何帮助将不胜感激。

library(tidyverse)
d <- data.frame(x = c('a,a,b,b,b','a,a','a,b,b,b,c,c,c'))
# Approach 1 (mutate)
d %>% 
  mutate(y = paste(unique(str_split(x, ',')[[1]]), collapse = ','))
d
# Approach 2 (loop)
for (i in 1:nrow(d))
{
  d$y[i] <- paste(unique(str_split(d$x[i], ',')[[1]]), collapse = ',')
}
d

我希望两种方法的输出都相同，但事实并非如此。

Answer 1

问题是我们只用list子集[[1]]子集，然后unique仅在该元素上。相反，我们需要遍历list（从str_split输出）

library(tidyverse) 
d %>%
     mutate(y = str_split(x, ',') %>%  # output is a list
                   map_chr(~ unique(.x) %>% # loop with map, get the unique elements 
                    toString)) # paste the strings together
#             x       y
#1     a,a,b,b,b    a, b
#2           a,a       a
#3 a,b,b,b,c,c,c a, b, c

在for循环中，情况并非如此，因为一次str_split(d$x[i]一次只对一个元素进行了分割

为了更好地理解，str_split（strsplit基R ) is vectorized. They can take multiple strings and split into a列表of向量等于初始向量的长度

str_split(d$x, ',') # list of length 3
#[[1]]
#[1] "a" "a" "b" "b" "b"

#[[2]]
#[1] "a" "a"

#[[3]]
#[1] "a" "b" "b" "b" "c" "c" "c"

提取第一个[[1]]

str_split(d$x, ',')[[1]]
#[1] "a" "a" "b" "b" "b"

在for循环中，我们分别拆分元素并提取列表（长度为1）元素

str_split(d$x[1], ',')[[1]]
#[1] "a" "a" "b" "b" "b"
str_split(d$x[2], ',')[[1]]
#[1] "a" "a"

这就是原因，我们需要遍历list，然后从每个元素中获取unique

为什么mutate不会像我期望的那样工作？

1 个答案: