在R中,我有df1,df2和df3代表闪电风暴。每个df都有两列,“城市”和“受伤”。
df1 = data.frame(city=c("atlanta", "new york"), injuries=c(5,8))
df2 = data.frame(city=c("chicago", "new york"), injuries=c(2,3))
df3 = data.frame(city=c("los angeles", "atlanta"), injuries=c(1,7))
我想合并城市列上一种外部联接的所有3个数据框,以便所有城市都显示在合并的数据框中,并且伤害计数将总结如下:
combined.df
city df1.freq df2.freq df3.freq
atlanta 5 0 7
new york 8 3 0
chicago 0 2 0
los angeles 0 0 1
答案 0 :(得分:4)
这对任意数量的data.frames都是通用的:
library(functional)
Reduce(Curry(merge, by = "city", all = TRUE), list(df1, df2, df3))
# city injuries.x injuries.y injuries
# 1 atlanta 5 NA 7
# 2 new york 8 3 NA
# 3 chicago NA 2 NA
# 4 los angeles NA NA 1
但是,多次合并可能会很慢。另一种方法是将data.frames堆叠成一个长的:
df.long <- do.call(rbind, Map(transform, list(df1, df2, df3),
name = c("df1", "df2", "df3")))
# city injuries name
# 1 atlanta 5 df1
# 2 new york 8 df1
# 3 chicago 2 df2
# 4 new york 3 df2
# 5 los angeles 1 df3
# 6 atlanta 7 df3
然后使用xtabs
重新整形数据,例如:
xtabs(injuries ~ city + name, df.long)
# name
# city df1 df2 df3
# atlanta 5 0 7
# new york 8 3 0
# chicago 0 2 0
# los angeles 0 0 1
(reshape
函数也可能对最后一步有用,但我不是很熟悉它。)
答案 1 :(得分:2)
merge
是你的朋友。输入?merge
了解详情。
> merge(merge(df1, df2, by = "city", all = TRUE), df3, by = "city", all = TRUE)
city injuries.x injuries.y injuries
1 atlanta 5 NA 7
2 chicago NA 2 NA
3 los angeles NA NA 1
4 new york 8 3 NA
修改即可。虽然我喜欢@ flodel的解决方案,但这是一个更直接的解决方案,可能更容易理解:
Reduce(function(d1, d2) merge(d1, d2, all = TRUE, by = "city"), list(df1, df2, df3))
答案 2 :(得分:1)
使用基本R reshape
函数替代@ flodel的版本:
dat <- list(df1,df2,df3)
intm <- data.frame(do.call(rbind,dat),val=rep(seq_along(dat),sapply(dat,nrow)))
reshape(intm, idvar="city", timevar="val", direction="wide")
# city injuries.1 injuries.2 injuries.3
#1 atlanta 5 NA 7
#2 new york 8 3 NA
#3 chicago NA 2 NA
#5 los angeles NA NA 1
答案 3 :(得分:1)
这是我使用reshape::cast
的解决方案(谢谢,@ thelatemail!)。将id
变量添加到每个数据框,将它们绑定,然后转换为宽格式:
df1$id <- 'df1.freq'
df2$id <- 'df2.freq'
df3$id <- 'df3.freq'
rb <- rbind(df1,df2,df3)
library(reshape)
cast(rb, city ~ id, value='injuries')
结果:
city df1.freq df2.freq df3.freq
1 atlanta 5 NA 7
2 new york 8 3 NA
3 chicago NA 2 NA
4 los angeles NA NA 1