基于R中ID的部分匹配进行合并

时间:2015-12-29 19:54:04

标签: r merge

我想根据ID的部分匹配合并两个数据集,ID是由逗号分隔的ID字符串。所以我的第一个数据集有一个基本的ID字段(一个字符串),但第二个数据集有一个ID字符串,其中包含用逗号分隔的多个不同的ID。例如:

TIME   GROUPID  y1  y2
 1      A       0    1
 1      B       1    1
 2      C       1    0
 2      D       0    0
 3      E       1    0 

TIME  GROUPID   x1
  1     A,B     4 
  2     B,C     2    
  3      E      3    

我想基于TIME和GROUPID合并这两个数据集,但对于GROUPID,我希望部分匹配。因此“A”将匹配“A”,但它也匹配“A,E”或“B,A”。结果:

  TIME   GROUPID  y1  y2  x1
 1      A       0    1    4
 1      B       1    1    4
 2      C       1    0    2
 2      D       0    0    NA
 3      E       1    0    3

提前非常感谢!

1 个答案:

答案 0 :(得分:4)

cSplit尝试splitstackshape。使用参数direction="long",我们为查找data.frame提供标准合并布局:

library(splitstackshape)
lkup <- cSplit(df2, "GROUPID", direction="long")
merge(df1, lkup, by=c("TIME", "GROUPID"), all.x=T)
#   TIME GROUPID y1 y2 x1
# 1    1       A  0  1  4
# 2    1       B  1  1  4
# 3    2       C  1  0  2
# 4    2       D  0  0 NA
# 5    3       E  1  0  3

数据

df1 <- structure(list(TIME = c(1L, 1L, 2L, 2L, 3L), GROUPID = structure(1:5, .Label = c("A", 
"B", "C", "D", "E"), class = "factor"), y1 = c(0L, 1L, 1L, 0L, 
1L), y2 = c(1L, 1L, 0L, 0L, 0L)), .Names = c("TIME", "GROUPID", 
"y1", "y2"), class = "data.frame", row.names = c(NA, -5L))

df2 <- structure(list(TIME = 1:3, GROUPID = structure(1:3, .Label = c("A,B", 
"B,C", "E"), class = "factor"), x1 = c(4L, 2L, 3L)), .Names = c("TIME", 
"GROUPID", "x1"), class = "data.frame", row.names = c(NA, -3L
))