仅使用来自一个数据帧的列合并两个数据帧,而忽略R中的其他数据帧

时间:2015-03-29 14:22:02

标签: r dataframe

我正在尝试合并两个数据帧,我一直在阅读不同的帖子,但我找不到获得所需输出的方法。

dfA:
Name Surname C
Ja Men T
Ale Bu T
Ge Men 

dfB:

Name Surname C Ex
Ge Men T hello
Je Di T hello

期望的输出:

Merge:
Name Surname C
Ja Men T
Ale Bu T
Ge Men T
Je Di T

即,使用dfB中的可用列填充dfA中的列,并忽略dfB中不存在于dfA中的列。

我试过了:

merge(dfA,dfB, by=c("Name", "Surname", "Caracter"), all.x = T)

和其他合并组合。我尝试使用dplyr但无法获得满意的结果。

任何帮助都会受到重视。

提前致谢

数据:

dfA <- data.frame(
  name=c("Ja", "Ale", "Ge"),
  surname=c("Men", "Bu", "Men"), 
  C= c("T", "T", NA))

dfB <- data.frame(
  name=c("Ge", "Je"),
  surname=c("Men","Di"), 
  C= c("T","T"),
  X = c("hello","hello"))

使用dput():

# based on dput(dfA)
dfA <- structure(list(name = structure(c(3L, 1L, 2L), .Label = c("Ale", 
"Ge", "Ja"), class = "factor"), surname = structure(c(2L, 1L, 
2L), .Label = c("Bu", "Men"), class = "factor"), C = structure(c(1L, 
1L, NA), .Label = "T", class = "factor")), .Names = c("name", 
"surname", "C"), row.names = c(NA, -3L), class = "data.frame")

# based on dput(dfB)
dfB <- structure(list(name = structure(1L, .Label = "Ge", class = "factor"), 
    surname = structure(1L, .Label = "Men", class = "factor"), 
    C = "T", X = structure(1L, .Label = "hello", class = "factor")), 
    .Names = c("name", "surname", "C", "X"), 
    row.names = c(NA, -1L), class = "data.frame")

2 个答案:

答案 0 :(得分:0)

假设输入与问题末尾显示的输出相同,我们执行dfAdfB的左连接。请注意coalese返回其第一个非空参数 - NA被视为SQL空值:

library(sqldf)
sqldf("select A.Name, A.Surname, coalesce(A.C, B.C) C
       from dfA A left join dfB B on A.Name = B.Name and A.Surname = B.Surname")

,并提供:

  name surname C
1   Ja     Men T
2  Ale      Bu T
3   Ge     Men T

答案 1 :(得分:0)

我们可以使用我的软件包safejoin中的safe_full_join,并使用dplyr::coalesce解决列冲突:

# devtools::install_github("moodymudskipper/safejoin")
library(safejoin)
library(dplyr)
safe_full_join(dfA, dfB[names(dfA)], by=c("name","surname"), conflict = coalesce, check="") 
#   name surname C
# 1   Ja     Men T
# 2  Ale      Bu T
# 3   Ge     Men T
# 4   Je      Di T

check = ""用于不显示警告,因为我们正在连接具有不同级别的要素列