Question

这是非常困难的。我尝试了full_join和bind_cols和merge的变体，但无法完全正常工作。

我有：

> (t1 <- data.frame(x = letters[10:3], stringsAsFactors = FALSE))
  x
1 j
2 i
3 h
4 g
5 f
6 e
7 d
8 c

和：

> (t2 <- data.frame(y = letters[1:4], stringsAsFactors = FALSE))
  y
1 a
2 b
3 c
4 d

我认为我正在寻找的是某种full_join类型，它同时保留两列并进行设置操作，因为我想返回以下内容：

> data.frame(
+   x = c(letters[10:3], NA, NA),
+   y = c(NA, NA, NA, NA, NA, NA, letters[4:1])
+ )
      x    y
1     j <NA>
2     i <NA>
3     h <NA>
4     g <NA>
5     f <NA>
6     e <NA>
7     d    d
8     c    c
9  <NA>    b
10 <NA>    a

所以它类似于full_join，但是保留了两列并填充存在差异的NA。例如，这仅给我一列：

> full_join(t1, t2, by = c("x" = "y"))
   x
1  j
2  i
3  h
4  g
5  f
6  e
7  d
8  c
9  a
10 b

Answer 1

有点骇人听闻，但这可行：

full_join(
  left_join(t1, t2 %>% mutate(x = y)),
  left_join(t2, t1 %>% mutate(y = x))
)

      x    y
1     j <NA>
2     i <NA>
3     h <NA>
4     g <NA>
5     f <NA>
6     e <NA>
7     d    d
8     c    c
9  <NA>    a
10 <NA>    b

Answer 2

您还可以找到union和match

inds <- union(t1$x, t2$y)
data.frame(x = t1$x[match(inds, t1$x)], y = t2$y[match(inds, t2$y)])

#      x    y
#1     j <NA>
#2     i <NA>
#3     h <NA>
#4     g <NA>
#5     f <NA>
#6     e <NA>
#7     d    d
#8     c    c
#9  <NA>    a
#10 <NA>    b

合并列，按值对齐，当值不匹配时填写NA

2 个答案: