我想根据ID的部分匹配合并两个数据集,ID是由逗号分隔的ID字符串。所以我的第一个数据集有一个基本的ID字段(一个字符串),但第二个数据集有一个ID字符串,其中包含用逗号分隔的多个不同的ID。例如:
TIME GROUPID y1 y2
1 A 0 1
1 B 1 1
2 C 1 0
2 D 0 0
3 E 1 0
TIME GROUPID x1
1 A,B 4
2 B,C 2
3 E 3
我想基于TIME和GROUPID合并这两个数据集,但对于GROUPID,我希望部分匹配。因此“A”将匹配“A”,但它也匹配“A,E”或“B,A”。结果:
TIME GROUPID y1 y2 x1
1 A 0 1 4
1 B 1 1 4
2 C 1 0 2
2 D 0 0 NA
3 E 1 0 3
提前非常感谢!
答案 0 :(得分:4)
从cSplit
尝试splitstackshape
。使用参数direction="long"
,我们为查找data.frame提供标准合并布局:
library(splitstackshape)
lkup <- cSplit(df2, "GROUPID", direction="long")
merge(df1, lkup, by=c("TIME", "GROUPID"), all.x=T)
# TIME GROUPID y1 y2 x1
# 1 1 A 0 1 4
# 2 1 B 1 1 4
# 3 2 C 1 0 2
# 4 2 D 0 0 NA
# 5 3 E 1 0 3
数据强>
df1 <- structure(list(TIME = c(1L, 1L, 2L, 2L, 3L), GROUPID = structure(1:5, .Label = c("A",
"B", "C", "D", "E"), class = "factor"), y1 = c(0L, 1L, 1L, 0L,
1L), y2 = c(1L, 1L, 0L, 0L, 0L)), .Names = c("TIME", "GROUPID",
"y1", "y2"), class = "data.frame", row.names = c(NA, -5L))
df2 <- structure(list(TIME = 1:3, GROUPID = structure(1:3, .Label = c("A,B",
"B,C", "E"), class = "factor"), x1 = c(4L, 2L, 3L)), .Names = c("TIME",
"GROUPID", "x1"), class = "data.frame", row.names = c(NA, -3L
))