在数据框中嵌套几组列

时间:2018-11-16 19:15:29

标签: r tidyverse tidyr

将几列嵌套到一个列表列中的概念非常强大。但是,我不确定是否可以使用nest中的{tidyr}函数在同一管道内将一组以上的列嵌套到几个列表列中。例如,假设我具有以下数据框:

df <- as.data.frame(replicate(6, runif(10) * 100))

colnames(df) <- c(
    paste0("a", 1:2), # a1, a2
    paste0("b", 1:4) # b1, b2, b3, b4
)

df
          a1        a2        b1       b2        b3        b4
1  20.807348 69.339482 91.837151 99.76813  3.394350 33.780049
2  64.667733 20.676381 80.523369 38.42774 85.635208 60.111491
3  55.352501 55.699571  4.812923 38.65333 98.869203 80.345576
4  45.194094 16.511696 83.834651 51.48698  7.191081 16.697210
5  66.401642 89.041055 26.965636 67.90061 90.622428 59.552935
6  35.750100 55.997766 49.768556 68.45900 67.523080 58.993232
7  21.392823  5.335281 56.348328 35.68331 51.029617 66.290035
8   8.851236 19.486580 14.199370 22.49754 14.617592 18.236406
9  70.475652  6.229997 43.169364 12.63378 21.415589  2.163004
10 47.837613 37.641530 38.001288 71.15896 71.000568  2.135611

我想将“ a”列嵌套在一个列表列中,而将“ b”列嵌套在一个第二个列表列中,因为我想对它们执行不同的计算。

嵌套“ a”列有效:

library(tidyr)
nest(df, a1, a2, .key = "a")

          b1       b2        b3        b4                   a
1  91.837151 99.76813  3.394350 33.780049  20.80735, 69.33948
2  80.523369 38.42774 85.635208 60.111491  64.66773, 20.67638
3   4.812923 38.65333 98.869203 80.345576  55.35250, 55.69957
4  83.834651 51.48698  7.191081 16.697210  45.19409, 16.51170
5  26.965636 67.90061 90.622428 59.552935  66.40164, 89.04105
6  49.768556 68.45900 67.523080 58.993232  35.75010, 55.99777
7  56.348328 35.68331 51.029617 66.290035 21.392823, 5.335281
8  14.199370 22.49754 14.617592 18.236406 8.851236, 19.486580
9  43.169364 12.63378 21.415589  2.163004 70.475652, 6.229997
10 38.001288 71.15896 71.000568  2.135611  47.83761, 37.64153

但是不可能在嵌套“ a”列之后嵌套“ b”列:

nest(df, a1, a2, .key = "a") %>%
    nest(b1, b2, b3, b4, .key = "b")
Error in grouped_df_impl(data, unname(vars), drop) : 
  Column `a` can't be used as a grouping variable because it's a list

通过读取错误消息才有意义。

我的解决方法是:

  • 嵌套“ a”列

  • 在“ a”列表列上执行所需的计算

  • 打扰“ a”列表列

  • 嵌套“ b”列

  • 在“ b”列表列上执行所需的计算

  • 打扰“ b”列表列

是否有更直接的方法来实现这一目标?非常感谢您的帮助。

1 个答案:

答案 0 :(得分:4)

我们可以使用map来完成

library(tidyverse)
out <- list('a', 'b') %>% 
           map(~ df %>% 
            select(matches(.x)) %>% 
            nest(names(.), .key = !! rlang::sym(.x))) %>%
            bind_cols
out
# A tibble: 1 x 2
#  a                     b                    
#  <list>                <list>               
#1 <data.frame [10 × 2]> <data.frame [10 × 4]>


out %>%
   unnest
# A tibble: 10 x 6
#      a1    a2    b1    b2    b3    b4
#   <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 20.8  69.3  91.8   99.8  3.39 33.8 
# 2 64.7  20.7  80.5   38.4 85.6  60.1 
# 3 55.4  55.7   4.81  38.7 98.9  80.3 
# 4 45.2  16.5  83.8   51.5  7.19 16.7 
# 5 66.4  89.0  27.0   67.9 90.6  59.6 
# 6 35.8  56.0  49.8   68.5 67.5  59.0 
# 7 21.4   5.34 56.3   35.7 51.0  66.3 
# 8  8.85 19.5  14.2   22.5 14.6  18.2 
# 9 70.5   6.23 43.2   12.6 21.4   2.16
#10 47.8  37.6  38.0   71.2 71.0   2.14

我们可以对列的“ a”和“ b”列表分别进行计算

out %>%
    mutate(a = map(a, `*`, 4)) %>% 
    unnest
# A tibble: 10 x 6
#      a1    a2    b1    b2    b3    b4
#   <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1  83.2 277.  91.8   99.8  3.39 33.8 
# 2 259.   82.7 80.5   38.4 85.6  60.1 
# 3 221.  223.   4.81  38.7 98.9  80.3 
# 4 181.   66.0 83.8   51.5  7.19 16.7 
# 5 266.  356.  27.0   67.9 90.6  59.6 
# 6 143.  224.  49.8   68.5 67.5  59.0 
# 7  85.6  21.3 56.3   35.7 51.0  66.3 
# 8  35.4  77.9 14.2   22.5 14.6  18.2 
# 9 282.   24.9 43.2   12.6 21.4   2.16
#10 191.  151.  38.0   71.2 71.0   2.14

话虽如此,也可以用mutate_at选择感兴趣的列,而不用进行nest/unnest

df %>% 
    mutate_at(vars(matches('^a\\d+')), funs(.*4))