R在重复上水平合并

时间:2014-08-05 22:42:16

标签: r merge duplicates

我正在尝试合并2个数据框并且水平追加多个匹配项:

数据集1:

id
1 email1
1 email1b
2 email2
3 email3

dataset2:

id name
1 bob
2 rob
3 kat

我想使用merge在id上组合这些数据帧。当存在与id 1重复的匹配时,我希望通过" id"进行合并。水平返回两个结果:

id name email 
1 bob email1 email1b
2 rob email2
3 kat email3

似乎合并无法做到这一点,它会为重复值创建多行。还有其他想法吗?

2 个答案:

答案 0 :(得分:0)

您可以预先聚合dataset1,如下所示:

dataset1 <- read.table(header = TRUE, text = "
id email
1 email1
1 email1b
2 email2
3 email3")

dataset2 <- read.table(header = TRUE, text = "
id name
1 bob
2 rob
3 kat")

dataset1 <- with(dataset1, aggregate(x = email, by = list(id = id), FUN = paste, collapse = " "))
merge(x = dataset1, y = dataset2, by = "id")[, c(1, 3, 2)]
#   id name              x
# 1  1  bob email1 email1b
# 2  2  rob         email2
# 3  3  kat         email3

答案 1 :(得分:0)

dataset1 <- aggregate(email ~ id, dataset1, paste, collapse = " ")
merge(dataset2, dataset1, by = "id")

#   id name           email
# 1  1  bob email1  email1b
# 2  2  rob          email2
# 3  3  kat          email3

如果您通过快速聚合和大数据集合并获得一些乐趣,那么data.table方法

library(data.table)
setkey(dataset1 <- setDT(dataset1)[, list(email = paste(email, collapse = " ")), by = id], id)
setkey(setDT(dataset2), id)
dataset2[dataset1]

##    id name          email
## 1:  1  bob email1 email1b
## 2:  2  rob         email2
## 3:  3  kat         email3