我正在尝试合并两个数据帧,我一直在阅读不同的帖子,但我找不到获得所需输出的方法。
dfA:
Name Surname C
Ja Men T
Ale Bu T
Ge Men
dfB:
Name Surname C Ex
Ge Men T hello
Je Di T hello
期望的输出:
Merge:
Name Surname C
Ja Men T
Ale Bu T
Ge Men T
Je Di T
即,使用dfB中的可用列填充dfA中的列,并忽略dfB中不存在于dfA中的列。
我试过了:
merge(dfA,dfB, by=c("Name", "Surname", "Caracter"), all.x = T)
和其他合并组合。我尝试使用dplyr但无法获得满意的结果。
任何帮助都会受到重视。
提前致谢
数据:
dfA <- data.frame(
name=c("Ja", "Ale", "Ge"),
surname=c("Men", "Bu", "Men"),
C= c("T", "T", NA))
dfB <- data.frame(
name=c("Ge", "Je"),
surname=c("Men","Di"),
C= c("T","T"),
X = c("hello","hello"))
使用dput():
# based on dput(dfA)
dfA <- structure(list(name = structure(c(3L, 1L, 2L), .Label = c("Ale",
"Ge", "Ja"), class = "factor"), surname = structure(c(2L, 1L,
2L), .Label = c("Bu", "Men"), class = "factor"), C = structure(c(1L,
1L, NA), .Label = "T", class = "factor")), .Names = c("name",
"surname", "C"), row.names = c(NA, -3L), class = "data.frame")
# based on dput(dfB)
dfB <- structure(list(name = structure(1L, .Label = "Ge", class = "factor"),
surname = structure(1L, .Label = "Men", class = "factor"),
C = "T", X = structure(1L, .Label = "hello", class = "factor")),
.Names = c("name", "surname", "C", "X"),
row.names = c(NA, -1L), class = "data.frame")
答案 0 :(得分:0)
假设输入与问题末尾显示的输出相同,我们执行dfA
与dfB
的左连接。请注意coalese
返回其第一个非空参数 - NA
被视为SQL空值:
library(sqldf)
sqldf("select A.Name, A.Surname, coalesce(A.C, B.C) C
from dfA A left join dfB B on A.Name = B.Name and A.Surname = B.Surname")
,并提供:
name surname C
1 Ja Men T
2 Ale Bu T
3 Ge Men T
答案 1 :(得分:0)
我们可以使用我的软件包safejoin中的safe_full_join
,并使用dplyr::coalesce
解决列冲突:
# devtools::install_github("moodymudskipper/safejoin")
library(safejoin)
library(dplyr)
safe_full_join(dfA, dfB[names(dfA)], by=c("name","surname"), conflict = coalesce, check="")
# name surname C
# 1 Ja Men T
# 2 Ale Bu T
# 3 Ge Men T
# 4 Je Di T
check = ""
用于不显示警告,因为我们正在连接具有不同级别的要素列