我有以下数据框
df1 <- tibble::as.tibble(list(a = c(1,2,3), d = c(10,11,12) ,id = c("a","b","c")))
df2 <- tibble::as.tibble(list(a = c(4,5,6), e = c(13,14,15) ,id = c("a","b","c")))
df3 <- tibble::as.tibble(list(a = c(7,8,9), f = c(16,17,18) ,id = c("a","b","c")))
我想将这些数据框合并在一起。由于列名a
在所有这些列中都有使用,因此合并时将使用suffix
参数。
我想要的结果是
| id | a.df1 | d | a.df2 | e | a.df3 | f |
|----|-------|----|-------|----|-------|----|
| a | 1 | 10 | 4 | 13 | 7 | 16 |
| b | 2 | 11 | 5 | 14 | 8 | 17 |
| c | 3 | 12 | 6 | 15 | 9 | 18 |
下面是我尝试的代码
test_list <- list(df1, df2, df3)
names(test_list) <- c("df1", "df2", "df3")
seq_along(temp) %>%
purrr::reduce(
~merge(
temp[[.x]],
temp[[.y]],
suffix = c(names(test_list[.x]), names(test_list[.y])))
但是这会导致错误说明
Error in temp[[.x]] : invalid subscript type 'list
。为什么我不能在合并功能中子集到数据框
还有一种更好的方法可以将具有相同列名的多个数据框的列表组合在一起。
答案 0 :(得分:3)
library(tidyverse)
df1 <- tibble::as.tibble(list(a = c(1,2,3), d = c(10,11,12) ,id = c("a","b","c")))
df2 <- tibble::as.tibble(list(a = c(4,5,6), e = c(13,14,15) ,id = c("a","b","c")))
df3 <- tibble::as.tibble(list(a = c(7,8,9), f = c(16,17,18) ,id = c("a","b","c")))
# create your list and the names
test_list <- list(df1, df2, df3)
names(test_list) <- c("df1", "df2", "df3")
# spot overlapping columns
test_list %>%
map_df(names) %>%
gather() %>%
count(value) %>%
filter(n > 1 & value != "id") %>%
pull(value) -> overlaps
map2(test_list, names(test_list), ~{names(.x)[names(.x) %in% overlaps] = paste0(names(.x)[names(.x) %in% overlaps],".",.y); .x}) %>%
reduce(function(x,y) left_join(x,y, by="id")) %>%
select(id, everything())
# # A tibble: 3 x 7
# id a.df1 d a.df2 e a.df3 f
# <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 a 1 10 4 13 7 16
# 2 b 2 11 5 14 8 17
# 3 c 3 12 6 15 9 18
给出列表及其名称,我们使用map2
更新位置1(即列a
)中每个元素的名称。
然后,我们使用reduce
顺序连接数据帧,然后使用select
排列列。
答案 1 :(得分:1)
这看起来如何?
t <- merge(df1,df2, by = "id" )
df <- merge(t,df3, by = "id" )
names(df) <- c("id", "a.df1", "a.df2", "a.df3")
还是我猜对了,实际上您还有更多的列,并且不想像这样合并所有内容?
答案 2 :(得分:0)
我的包safejoin的功能eat
具有这样的功能,如果您给
它是data.frames的列表作为第二个输入,它将加入它们
递归到第一个输入。我们可以重命名所有“ a”列并使用它。
# devtools::install_github("moodymudskipper/safejoin")
library(safejoin)
dfs <- imap(lst(df1,df2,df3), ~rename_at(.x, "a",paste, .y, sep="."), .y) %>%
unname()
eat(dfs[[1]], dfs[-1], .by = "id")
# # A tibble: 3 x 7
# id a.df1 d a.df2 e a.df3 f
# <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 a 1 10 4 13 7 16
# 2 b 2 11 5 14 8 17
# 3 c 3 12 6 15 9 18