在R中匹配数据帧

时间:2018-11-16 02:17:41

标签: r

我有两个数据集,x和y。基本上是希望R扫描数据集x和数据集y中的前两列,以及是否在数据集y的前两列中找到了两个字符串,然后返回该记录和相关的第三列。

x数据集示例:

speciesA    speciesB
species22   species11
species33   species44
species44   species44
...

示例y数据集:

speciesA    speciesB    dist
species11   species22   9
species33   species44   14
species55   species33   5
...

所需的输出:

speciesA    speciesB    dist
species11   species22   9
species33   species44   14

3 个答案:

答案 0 :(得分:0)

output <- merge(x = x, y = y, by = c('speciesA', 'speciesB'), all.x = F, all.y = F)
output <- output[, c('speciesA', 'speciesB', 'dist')]) # column order

答案 1 :(得分:0)

dplyr库具有不错的加入工作流程:

library(dplyr)

x <- data.frame(speciesA = c("species11", "species33", "species44"),
                speciesB = c("species22", "species44", "species44"))

y <- data.frame(speciesA = c("species11", "species33", "species55"),
                speciesB = c("species22", "species44", "species33"),
                dist = c(9, 14, 5))

output <- inner_join(x, y)

产生:

> output
   speciesA  speciesB dist
1 species11 species22    9
2 species33 species44   14

答案 2 :(得分:0)

首先,如何创建TRULY可重现的示例:

x <- data.frame(spA=c('species22','species33','species44'),
                spB=c('species11','species44','species44'),
                stringsAsFactors=F)
y <- data.frame(spA=c('species11','species33','species55'),
                spB=c('species22','species44','species33'),
                dist=c(9,14,5),
                stringsAsFactors=F)
x
y

然后,该函数以字母顺序粘贴每个数据框中的两个种类,创建一个新列,然后通过此新列合并两个数据框。

pasteSorted <- function(spp) {
  return(paste0(sort(spp),collapse=','))
}
x$spp <- apply(x[,1:2],1,pasteSorted)
y$spp <- apply(y[,1:2],1,pasteSorted)
x
y
z <- merge(x,y,by='spp')

最后,删除不必要的列,然后重命名其他列。

z <- z[,-(1:3)]
names(z) <- c('spA','spB','dist')
z