R根据列和列表合并两个数据帧

时间:2019-02-17 14:02:32

标签: r merge

无论如何,我可以基于列表形式的列合并R中的两个数据帧以获取其他列的总和。以下是一些示例数据:

df1 <- structure(list(id = c("1", "2"), 
                      band = list(c("c1", "c2", "c3"), "c4"), 
                      samples = list(c(32, 2, 61), 20), 
                      time = list(c(307, 2, 238), 74)), 
                 .Names = c("id", "band", "samples", "time"), 
                 row.names = 0:1, class = "data.frame")

enter image description here

df2 <- structure(list(id = c("1", "3"), 
                      band = list(c("c1", "c4"), "c1"), 
                      samples = list(c(1, 2), 2), 
                      time = list(c(4, 2), 7)), 
                 .Names = c("id", "band", "samples", "time"), 
                 row.names = 0:1, class = "data.frame")

enter image description here

我想根据id和bands列从df1和df2获取合并数据。不幸的是,bands列是列表形式,我需要基于bands列中的元素对样本和time列求和,该元素位于list from中。我期待以下

enter code here

1 个答案:

答案 0 :(得分:1)

一种解决方案是将unnest包中的tidyrbind_rowsgroup_by中的summarizedplyr结合使用。

library(tidyr)
library(dplyr)

unnest处理列表列:

df1_unnest <- df1 %>% 
  unnest()

df1_unnest
#   id band samples time
# 1  1   c1      32  307
# 2  1   c2       2    2
# 3  1   c3      61  238
# 4  2   c4      20   74

df2_unnest <- df2 %>% 
  unnest()

bind_rows组合了两个新的data.frames:

new_df <- bind_rows(df1_unnest, df2_unnest)

new_df
#   id band samples time
# 1  1   c1      32  307
# 2  1   c2       2    2
# 3  1   c3      61  238
# 4  2   c4      20   74
# 5  1   c1       1    4
# 6  1   c4       2    2
# 7  3   c1       2    7

然后用group_bysummarize_all可以将ID 1的值求和,即c1波段:

new_df <- new_df %>% 
  group_by(id, band) %>% 
  summarize_all(sum)

new_df
# A tibble: 6 x 4
# Groups:   id [?]
#   id    band  samples  time
#   <chr> <chr>   <dbl> <dbl>
# 1 1     c1         33   311
# 2 1     c2          2     2
# 3 1     c3         61   238
# 4 1     c4          2     2
# 5 2     c4         20    74
# 6 3     c1          2     7

如果您需要列表列可以

new_df_list <- new_df %>%
  group_by(id) %>% 
  summarize_all(list)

print.data.frame(new_df_list)
#   id           band      samples           time
# 1  1 c1, c2, c3, c4 33, 2, 61, 2 311, 2, 238, 2
# 2  2             c4           20             74
# 3  3             c1            2              7