Question

我试图找到分组变量中所有变量的相关性。具体来说，我试图使用purrr来替换我一直在使用的循环。但是我有点卡住了，部分是因为我想在申请感兴趣的矢量时使用两个函数。例如：

## load packages
library(corrr)
library(dplyr)
library(purrr)

没有任何组，这可以正常工作（这是我想做的事情的基础）：

iris %>%
  select(-Species) %>%
  correlate() %>%
  stretch()

但是当我尝试对此进行分组时，我受到了阻碍：

iris %>%
  group_by(Species) %>%
  correlate() %>%
  stretch()

stats :: cor（x = x，y = y，use = use，method = method）出错：'x' 必须是数字

所以我的想法是使用purrr ......似乎就是我使用它的确切位置？

iris %>%
  split(.$Species) %>%
  map_dbl(~correlate) ## then how do i incorporate `stretch()`

错误：无法将元素1从闭包强制转换为双精度

显然这是错的，但我不确定我应该如何在这里申请map_* ......

这是我想要替换的循环，它确实提供了正确的输出，但我宁愿不使用它 - 它不如purrr方法灵活：

Species <- unique(iris$Species)
df <- c()
for(i in seq_along(Species)){
  u <- iris %>%
    filter(Species == Species[i]) %>%
    select(-Species) %>%
    correlate() %>%
    stretch() %>%
    mutate(Species = Species[i])

  df <- rbind(df, u)
}

df

# A tibble: 48 x 4
              x            y         r Species
          <chr>        <chr>     <dbl>  <fctr>
 1 Sepal.Length Sepal.Length        NA  setosa
 2 Sepal.Length  Sepal.Width 0.7425467  setosa
 3 Sepal.Length Petal.Length 0.2671758  setosa
 4 Sepal.Length  Petal.Width 0.2780984  setosa
 5  Sepal.Width Sepal.Length 0.7425467  setosa
 6  Sepal.Width  Sepal.Width        NA  setosa
 7  Sepal.Width Petal.Length 0.1777000  setosa
 8  Sepal.Width  Petal.Width 0.2327520  setosa
 9 Petal.Length Sepal.Length 0.2671758  setosa
10 Petal.Length  Sepal.Width 0.1777000  setosa

总之，当我需要使用两个函数时，有人可以概述如何使用purrr。换句话说，我如何更换上面的循环？

Answer 1

您需要使用group_by %>% do更灵活的摘要语法，在do中，您可以使用.访问每个子组并应用correlate和stretch一个正常的数据框：

library(corrr)
library(dplyr)

iris %>% group_by(Species) %>% do(
    select(., -Species) %>% correlate() %>% stretch()
)

# A tibble: 48 x 4
# Groups:   Species [3]
#   Species            x            y         r
#    <fctr>        <chr>        <chr>     <dbl>
# 1  setosa Sepal.Length Sepal.Length        NA
# 2  setosa Sepal.Length  Sepal.Width 0.7425467
# 3  setosa Sepal.Length Petal.Length 0.2671758
# 4  setosa Sepal.Length  Petal.Width 0.2780984
# 5  setosa  Sepal.Width Sepal.Length 0.7425467
# 6  setosa  Sepal.Width  Sepal.Width        NA
# 7  setosa  Sepal.Width Petal.Length 0.1777000
# 8  setosa  Sepal.Width  Petal.Width 0.2327520
# 9  setosa Petal.Length Sepal.Length 0.2671758
#10  setosa Petal.Length  Sepal.Width 0.1777000
# ... with 38 more rows

使用purrr，您可以先在每个组下嵌套数据，然后在map上嵌套数据：

library(purrr)
library(tidyr)
library(dplyr)

iris %>% 
    group_by(Species) %>% nest() %>% 
    mutate(data = map(data, compose(stretch, correlate))) %>% 
    unnest()

# A tibble: 48 x 4
#   Species            x            y         r
#    <fctr>        <chr>        <chr>     <dbl>
# 1  setosa Sepal.Length Sepal.Length        NA
# 2  setosa Sepal.Length  Sepal.Width 0.7425467
# 3  setosa Sepal.Length Petal.Length 0.2671758
# 4  setosa Sepal.Length  Petal.Width 0.2780984
# 5  setosa  Sepal.Width Sepal.Length 0.7425467
# 6  setosa  Sepal.Width  Sepal.Width        NA
# 7  setosa  Sepal.Width Petal.Length 0.1777000
# 8  setosa  Sepal.Width  Petal.Width 0.2327520
# 9  setosa Petal.Length Sepal.Length 0.2671758
#10  setosa Petal.Length  Sepal.Width 0.1777000
# ... with 38 more rows

尝试用purrr :: map_dbl替换循环并使用corrr :: correlate

1 个答案: