Question

我有一个这样的列表：

list=list(
  df1=read.table(text = "a  b   c
11  14  20
                 17 15  12
                 6  19  17
                 ",header=T),

  df2=read.table(text = "a  b   c
6   19  12
                 9  7   19

                 ",header=T),
  df3=read.table(text = "a  d   f
12  20  15
                 12 10  8
                 7  8   7

                 ",header=T),
  df4=read.table(text = "g  f   e   z
5   12  11  5
16  17  20  16
19  9   11  20

                 ",header=T),
  df5=read.table(text = "g  f   e   z
15  13  9   18
                 12 12  17  16
                 15 9   12  11
                 15 20  19  15

                 ",header=T),
  df6=read.table(text = "a  d   f
11  7   16
                 11 12  11

                 ",header=T)
)

我的列表包含不同的数据框。根据列名，列表中有3种类型的数据框。

type1:df1 and df2
type2:df3 and df6
type3:f4 and df5

我要使用相同的列名访问rbind数据帧，并将结果保存在新列表中。例如，带有df2的df1，带有df6的df3和带有df5的df4具有相同的列名。我需要一个代码来自动识别和rbind数据帧具有相同的列名。

以下列表将作为结果：

> new list
$df1.df2
  a  b  c
1 11 14 20
2 17 15 12
3  6 19 17
4  6 19 12
5  9  7 19

$df3.df6
   a  d  f
1 12 20 15
2 12 10  8
3  7  8  7
4 11  7 16
5 11 12 11

$df4.df5
   g  f  e  z
1  5 12 11  5
2 16 17 20 16
3 19  9 11 20
4 15 13  9 18
5 12 12 17 16
6 15  9 12 11
7 15 20 19 15

新列表中数据框的名称可以是任何名称。

Answer 1

因为我不喜欢命名变量list，所以将您的数据命名为l。

lapply(
  split(l, sapply(l, function(a) paste(colnames(a), collapse = "_"))),
  dplyr::bind_rows)
# $a_b_c
#    a  b  c
# 1 11 14 20
# 2 17 15 12
# 3  6 19 17
# 4  6 19 12
# 5  9  7 19
# $a_d_f
#    a  d  f
# 1 12 20 15
# 2 12 10  8
# 3  7  8  7
# 4 11  7 16
# 5 11 12 11
# $g_f_e_z
#    g  f  e  z
# 1  5 12 11  5
# 2 16 17 20 16
# 3 19  9 11 20
# 4 15 13  9 18
# 5 12 12 17 16
# 6 15  9 12 11
# 7 15 20 19 15

我通常更喜欢使用by(data, INDICES, FUN)到lapply(split(data, INDICES), FUN)，但是出于某种原因，它一直在抱怨...所以上面的内容。

将列名与_折叠在一起的选择是任意的，目的是寻找一个简单的“散列”。不难想到这种方法会在两个帧不存在时找到两个相似的帧……也许不太可能成为一个问题。

我还应该注意，我使用的是dplyr::bind_rows，但dplyr没什么其他的。使用purrr::或其他整洁的分组可以很容易地将其转换为某种东西。

Answer 2

我们可以

library(tidyverse)
library(janitor)

bind_rows(dfls) %>% 
  mutate(code= apply(apply(., 2, function(x){
               ifelse(is.na(x), 1, 2)}), 1, paste, collapse="")) %>% 
  nest(.,-code, .key="code") %>% 
  mutate(filtered = map(code, janitor::remove_empty_cols)) %>% 
  pull(filtered) -> out

glimpse(out)

# List of 3
#  $ :Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 5 obs. of  3 variables:
#   ..$ a: int [1:5] 11 17 6 6 9
#   ..$ b: int [1:5] 14 15 19 19 7
#   ..$ c: int [1:5] 20 12 17 12 19
#  $ :Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 5 obs. of  3 variables:
#   ..$ a: int [1:5] 12 12 7 11 11
#   ..$ d: int [1:5] 20 10 8 7 12
#   ..$ f: int [1:5] 15 8 7 16 11
#  $ :Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 7 obs. of  4 variables:
#   ..$ f: int [1:7] 12 17 9 13 12 9 20
#   ..$ g: int [1:7] 5 16 19 15 12 15 15
#   ..$ e: int [1:7] 11 20 11 9 17 12 19
#   ..$ z: int [1:7] 5 16 20 18 16 11 15

如何用列表中的相同列名来绑定数据框

2 个答案: