无论如何,我可以基于列表形式的列合并R中的两个数据帧以获取其他列的总和。以下是一些示例数据:
df1 <- structure(list(id = c("1", "2"),
band = list(c("c1", "c2", "c3"), "c4"),
samples = list(c(32, 2, 61), 20),
time = list(c(307, 2, 238), 74)),
.Names = c("id", "band", "samples", "time"),
row.names = 0:1, class = "data.frame")
df2 <- structure(list(id = c("1", "3"),
band = list(c("c1", "c4"), "c1"),
samples = list(c(1, 2), 2),
time = list(c(4, 2), 7)),
.Names = c("id", "band", "samples", "time"),
row.names = 0:1, class = "data.frame")
我想根据id和bands列从df1和df2获取合并数据。不幸的是,bands列是列表形式,我需要基于bands列中的元素对样本和time列求和,该元素位于list from中。我期待以下
答案 0 :(得分:1)
一种解决方案是将unnest
包中的tidyr
和bind_rows
与group_by
中的summarize
和dplyr
结合使用。
library(tidyr)
library(dplyr)
unnest
处理列表列:
df1_unnest <- df1 %>%
unnest()
df1_unnest
# id band samples time
# 1 1 c1 32 307
# 2 1 c2 2 2
# 3 1 c3 61 238
# 4 2 c4 20 74
df2_unnest <- df2 %>%
unnest()
bind_rows
组合了两个新的data.frames:
new_df <- bind_rows(df1_unnest, df2_unnest)
new_df
# id band samples time
# 1 1 c1 32 307
# 2 1 c2 2 2
# 3 1 c3 61 238
# 4 2 c4 20 74
# 5 1 c1 1 4
# 6 1 c4 2 2
# 7 3 c1 2 7
然后用group_by
和summarize_all
可以将ID 1的值求和,即c1波段:
new_df <- new_df %>%
group_by(id, band) %>%
summarize_all(sum)
new_df
# A tibble: 6 x 4
# Groups: id [?]
# id band samples time
# <chr> <chr> <dbl> <dbl>
# 1 1 c1 33 311
# 2 1 c2 2 2
# 3 1 c3 61 238
# 4 1 c4 2 2
# 5 2 c4 20 74
# 6 3 c1 2 7
如果您需要列表列可以
new_df_list <- new_df %>%
group_by(id) %>%
summarize_all(list)
print.data.frame(new_df_list)
# id band samples time
# 1 1 c1, c2, c3, c4 33, 2, 61, 2 311, 2, 238, 2
# 2 2 c4 20 74
# 3 3 c1 2 7