Question

我试图比较多个向量，以查看它们之间存在匹配值的位置。我希望将向量组合成一个表，其中每列具有相同的值（对于匹配）或NA（对于不匹配）。

例如：

list1 <- c("a", "b", "c", "d")
list2 <- c("a", "c", "d")
list3 <- c("a", "b", "c", "e", "f")

应该成为：

a  a  a
b NA  b
c  c  c
d  d  NA
NA NA e
NA NA f

我尝试使用来自merge，join，dplyr的{{1}}，cbind来制作矢量数据帧，但所有这些都返回单个列或不匹配所有行的值。

使用R获得此结果的最佳方法是什么？

Answer 1

您可以使用unlist和unique获取所有可能的值，然后在每个向量中找到匹配项。如果没有匹配，match会按您的意愿返回NA：

list1 <- c("a", "b", "c", "d")
list2 <- c("a", "c", "d")
list3 <- c("a", "b", "c", "e", "f")
list_of_lists <- list(
  list1 = list1,
  list2 = list2,
  list3 = list3
)

all_values <- unique(unlist(list_of_lists))

fleshed_out <- vapply(
  list_of_lists,
  FUN.VALUE = all_values,
  FUN       = function(x) {
    x[match(all_values, x)]
  }
)

fleshed_out
#    list1 list2 list3
# [1,] "a"   "a"   "a"
# [2,] "b"   NA    "b"
# [3,] "c"   "c"   "c"
# [4,] "d"   "d"   NA
# [5,] NA    NA    "e"
# [6,] NA    NA    "f"

Answer 2

Base R解决方案：

df1 = data.frame(col = list1, list1)
df2 = data.frame(col = list2, list2)
df3 = data.frame(col = list3, list3)

Reduce(function(x, y) merge(x, y, all=TRUE), list(df1, df2, df3))

#   col list1 list2 list3
# 1   a     a     a     a
# 2   b     b  <NA>     b
# 3   c     c     c     c
# 4   d     d     d  <NA>
# 5   e  <NA>  <NA>     e
# 6   f  <NA>  <NA>     f

<强>结果：

> Reduce(function(x, y) merge(x, y, all=TRUE), list(df1, df2, df3))[,-1]
  list1 list2 list3
1     a     a     a
2     b  <NA>     b
3     c     c     c
4     d     d  <NA>
5  <NA>  <NA>     e
6  <NA>  <NA>     f

或dplyr + purrr：

library(dplyr)
library(purrr)

list(list1, list2, list3) %>%
  map(~ data.frame(col = ., ., stringsAsFactors = FALSE)) %>%
  reduce(full_join, by = "col") %>%
  select(-col) %>%
  setNames(paste0("list", 1:3))

数据：

list1 <- c("a", "b", "c", "d") list2 <- c("a", "c", "d") list3 <- c("a", "b", "c", "e", "f")

通过匹配值将向量连接到数据帧

2 个答案: