在指定列上合并R中的df1,df2,df3数据帧

时间:2013-12-04 03:47:06

标签: r join merge outer-join

在R中,我有df1,df2和df3代表闪电风暴。每个df都有两列,“城市”和“受伤”。

df1 = data.frame(city=c("atlanta", "new york"), injuries=c(5,8))
df2 = data.frame(city=c("chicago", "new york"), injuries=c(2,3))
df3 = data.frame(city=c("los angeles", "atlanta"), injuries=c(1,7))

我想合并城市列上一种外部联接的所有3个数据框,以便所有城市都显示在合并的数据框中,并且伤害计数将总结如下:

combined.df

city         df1.freq   df2.freq   df3.freq
atlanta      5          0          7
new york     8          3          0
chicago      0          2          0
los angeles  0          0          1

4 个答案:

答案 0 :(得分:4)

这对任意数量的data.frames都是通用的:

library(functional)
Reduce(Curry(merge, by = "city", all = TRUE), list(df1, df2, df3))
#          city injuries.x injuries.y injuries
# 1     atlanta          5         NA        7
# 2    new york          8          3       NA
# 3     chicago         NA          2       NA
# 4 los angeles         NA         NA        1

但是,多次合并可能会很慢。另一种方法是将data.frames堆叠成一个长的:

df.long <- do.call(rbind, Map(transform, list(df1, df2, df3),
                                         name = c("df1", "df2", "df3")))
#          city injuries name
# 1     atlanta        5  df1
# 2    new york        8  df1
# 3     chicago        2  df2
# 4    new york        3  df2
# 5 los angeles        1  df3
# 6     atlanta        7  df3

然后使用xtabs重新整形数据,例如:

xtabs(injuries ~ city + name, df.long)
#              name
# city          df1 df2 df3
#   atlanta       5   0   7
#   new york      8   3   0
#   chicago       0   2   0
#   los angeles   0   0   1

reshape函数也可能对最后一步有用,但我不是很熟悉它。)

答案 1 :(得分:2)

merge是你的朋友。输入?merge了解详情。

> merge(merge(df1, df2, by = "city", all = TRUE), df3, by = "city", all = TRUE)
         city injuries.x injuries.y injuries
1     atlanta          5         NA        7
2     chicago         NA          2       NA
3 los angeles         NA         NA        1
4    new york          8          3       NA

修改即可。虽然我喜欢@ flodel的解决方案,但这是一个更直接的解决方案,可能更容易理解:

 Reduce(function(d1, d2) merge(d1, d2, all = TRUE, by = "city"), list(df1, df2, df3))

答案 2 :(得分:1)

使用基本R reshape函数替代@ flodel的版本:

dat <- list(df1,df2,df3)
intm <- data.frame(do.call(rbind,dat),val=rep(seq_along(dat),sapply(dat,nrow)))
reshape(intm, idvar="city", timevar="val", direction="wide")

#         city injuries.1 injuries.2 injuries.3
#1     atlanta          5         NA          7
#2    new york          8          3         NA
#3     chicago         NA          2         NA
#5 los angeles         NA         NA          1

答案 3 :(得分:1)

这是我使用reshape::cast的解决方案(谢谢,@ thelatemail!)。将id变量添加到每个数据框,将它们绑定,然后转换为宽格式:

df1$id <- 'df1.freq'
df2$id <- 'df2.freq'
df3$id <- 'df3.freq'

rb <- rbind(df1,df2,df3)
library(reshape)
cast(rb, city ~ id, value='injuries')

结果:

         city df1.freq df2.freq df3.freq
1     atlanta        5       NA        7
2    new york        8        3       NA
3     chicago       NA        2       NA
4 los angeles       NA       NA        1