将几列嵌套到一个列表列中的概念非常强大。但是,我不确定是否可以使用nest
中的{tidyr}
函数在同一管道内将一组以上的列嵌套到几个列表列中。例如,假设我具有以下数据框:
df <- as.data.frame(replicate(6, runif(10) * 100))
colnames(df) <- c(
paste0("a", 1:2), # a1, a2
paste0("b", 1:4) # b1, b2, b3, b4
)
df
a1 a2 b1 b2 b3 b4
1 20.807348 69.339482 91.837151 99.76813 3.394350 33.780049
2 64.667733 20.676381 80.523369 38.42774 85.635208 60.111491
3 55.352501 55.699571 4.812923 38.65333 98.869203 80.345576
4 45.194094 16.511696 83.834651 51.48698 7.191081 16.697210
5 66.401642 89.041055 26.965636 67.90061 90.622428 59.552935
6 35.750100 55.997766 49.768556 68.45900 67.523080 58.993232
7 21.392823 5.335281 56.348328 35.68331 51.029617 66.290035
8 8.851236 19.486580 14.199370 22.49754 14.617592 18.236406
9 70.475652 6.229997 43.169364 12.63378 21.415589 2.163004
10 47.837613 37.641530 38.001288 71.15896 71.000568 2.135611
我想将“ a”列嵌套在一个列表列中,而将“ b”列嵌套在一个第二个列表列中,因为我想对它们执行不同的计算。
嵌套“ a”列有效:
library(tidyr)
nest(df, a1, a2, .key = "a")
b1 b2 b3 b4 a
1 91.837151 99.76813 3.394350 33.780049 20.80735, 69.33948
2 80.523369 38.42774 85.635208 60.111491 64.66773, 20.67638
3 4.812923 38.65333 98.869203 80.345576 55.35250, 55.69957
4 83.834651 51.48698 7.191081 16.697210 45.19409, 16.51170
5 26.965636 67.90061 90.622428 59.552935 66.40164, 89.04105
6 49.768556 68.45900 67.523080 58.993232 35.75010, 55.99777
7 56.348328 35.68331 51.029617 66.290035 21.392823, 5.335281
8 14.199370 22.49754 14.617592 18.236406 8.851236, 19.486580
9 43.169364 12.63378 21.415589 2.163004 70.475652, 6.229997
10 38.001288 71.15896 71.000568 2.135611 47.83761, 37.64153
但是不可能在嵌套“ a”列之后嵌套“ b”列:
nest(df, a1, a2, .key = "a") %>%
nest(b1, b2, b3, b4, .key = "b")
Error in grouped_df_impl(data, unname(vars), drop) :
Column `a` can't be used as a grouping variable because it's a list
通过读取错误消息才有意义。
我的解决方法是:
嵌套“ a”列
在“ a”列表列上执行所需的计算
打扰“ a”列表列
嵌套“ b”列
在“ b”列表列上执行所需的计算
打扰“ b”列表列
是否有更直接的方法来实现这一目标?非常感谢您的帮助。
答案 0 :(得分:4)
我们可以使用map
来完成
library(tidyverse)
out <- list('a', 'b') %>%
map(~ df %>%
select(matches(.x)) %>%
nest(names(.), .key = !! rlang::sym(.x))) %>%
bind_cols
out
# A tibble: 1 x 2
# a b
# <list> <list>
#1 <data.frame [10 × 2]> <data.frame [10 × 4]>
out %>%
unnest
# A tibble: 10 x 6
# a1 a2 b1 b2 b3 b4
# <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 20.8 69.3 91.8 99.8 3.39 33.8
# 2 64.7 20.7 80.5 38.4 85.6 60.1
# 3 55.4 55.7 4.81 38.7 98.9 80.3
# 4 45.2 16.5 83.8 51.5 7.19 16.7
# 5 66.4 89.0 27.0 67.9 90.6 59.6
# 6 35.8 56.0 49.8 68.5 67.5 59.0
# 7 21.4 5.34 56.3 35.7 51.0 66.3
# 8 8.85 19.5 14.2 22.5 14.6 18.2
# 9 70.5 6.23 43.2 12.6 21.4 2.16
#10 47.8 37.6 38.0 71.2 71.0 2.14
我们可以对列的“ a”和“ b”列表分别进行计算
out %>%
mutate(a = map(a, `*`, 4)) %>%
unnest
# A tibble: 10 x 6
# a1 a2 b1 b2 b3 b4
# <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 83.2 277. 91.8 99.8 3.39 33.8
# 2 259. 82.7 80.5 38.4 85.6 60.1
# 3 221. 223. 4.81 38.7 98.9 80.3
# 4 181. 66.0 83.8 51.5 7.19 16.7
# 5 266. 356. 27.0 67.9 90.6 59.6
# 6 143. 224. 49.8 68.5 67.5 59.0
# 7 85.6 21.3 56.3 35.7 51.0 66.3
# 8 35.4 77.9 14.2 22.5 14.6 18.2
# 9 282. 24.9 43.2 12.6 21.4 2.16
#10 191. 151. 38.0 71.2 71.0 2.14
话虽如此,也可以用mutate_at
选择感兴趣的列,而不用进行nest/unnest
df %>%
mutate_at(vars(matches('^a\\d+')), funs(.*4))