按行和列重叠合并2个数据帧

时间:2018-09-21 18:13:19

标签: r dataframe merge

我想将2个数据帧累加合并为

      taxonomy A B C
1          rat 0 1 2
2          dog 1 2 3
3          cat 2 3 0

      taxonomy A D C
1          rat 0 1 9
2        Horse 0 2 6
3          cat 2 0 2

生产

      taxonomy A B C  D
1          rat 0 1 11 1
2        Horse 0 0 6  2 
3          cat 4 3 2  0
4          dog 1 2 3  0

我尝试了聚合,合并,应用,ddply ....都没有成功...这将在具有数百行和几列的2个数据帧上完成

3 个答案:

答案 0 :(得分:3)

使用bind_rows中的dplyr

library(dplyr)

bind_rows(df1, df2) %>%
  group_by(taxonomy) %>%
  summarize_all(sum, na.rm = TRUE)

输出:

# A tibble: 4 x 5
  taxonomy     A     B     C     D
  <chr>    <int> <int> <int> <int>
1 cat          4     3     2     0
2 dog          1     2     3     0
3 Horse        0     0     6     2
4 rat          0     1    11     1

数据:

df1 <- structure(list(taxonomy = c("rat", "dog", "cat"), A = 0:2, B = 1:3, 
    C = c(2L, 3L, 0L)), .Names = c("taxonomy", "A", "B", "C"), class = "data.frame", row.names = c("1", 
"2", "3"))

df2 <- structure(list(taxonomy = c("rat", "Horse", "cat"), A = c(0L, 
0L, 2L), D = c(1L, 2L, 0L), C = c(9L, 6L, 2L)), .Names = c("taxonomy", 
"A", "D", "C"), class = "data.frame", row.names = c("1", "2", 
"3"))

答案 1 :(得分:2)

与{avid_useR的答案等效的data.table

library(data.table)
rbindlist(list(df1, df2), fill = TRUE)[, lapply(.SD, sum, na.rm = TRUE), by = taxonomy]
#   taxonomy A B  C D
#1:      rat 0 1 11 1
#2:      dog 1 2  3 0
#3:      cat 4 3  2 0
#4:    Horse 0 0  6 2

答案 2 :(得分:1)

你可以做...

> library(reshape2)
> dcast(rbind(melt(DF1), melt(DF2)), taxonomy ~ variable, fun.aggregate = sum)
Using taxonomy as id variables
Using taxonomy as id variables
  taxonomy A B  C D
1      cat 4 3  2 0
2      dog 1 2  3 0
3    Horse 0 0  6 2
4      rat 0 1 11 1

这会按字母顺序对行和列进行排序,但是我想通过使用factor可以避免这种情况。

数据:

DF1 = structure(list(taxonomy = c("rat", "dog", "cat"), A = 0:2, B = 1:3, 
    C = c(2L, 3L, 0L)), .Names = c("taxonomy", "A", "B", "C"), row.names = c(NA, 
-3L), class = "data.frame")
DF2 = structure(list(taxonomy = c("rat", "Horse", "cat"), A = c(0L, 
0L, 2L), D = c(1L, 2L, 0L), C = c(9L, 6L, 2L)), .Names = c("taxonomy", 
"A", "D", "C"), row.names = c(NA, -3L), class = "data.frame")