基于具有缺失值和字符串的id进行合并

时间:2016-12-11 09:42:57

标签: r

我的df如下所示

mydf<- structure(list(IDs = c(11L, 16L, 19L, 21L, 22L, 24L, 42L, 43L, 
51L), string1 = structure(c(1L, 8L, 7L, 2L, 4L, 9L, 6L, 3L, 5L
), .Label = c("b", "g", "hue", "hyu", "if", "jud", "ufhy", "uhgf;ffugf", 
"uhgs"), class = "factor"), IDs.1 = c(4L, 11L, 16L, 19L, 20L, 
22L, 29L, NA, NA), string2 = structure(c(2L, 3L, 8L, 7L, 4L, 
5L, 6L, 1L, 1L), .Label = c("", "a", "b", "higf;hdugd", "hyu", 
"inja", "ufhy", "uhgf;ffugf"), class = "factor")), .Names = c("IDs", 
"string1", "IDs.1", "string2"), class = "data.frame", row.names = c(NA, 
-9L))

我想把它们放在一起,如下所示

myout<- structure(list(Ids = c(4L, 11L, 16L, 19L, 20L, 21L, 22L, 24L, 
29L, 42L, 43L, 51L), string = structure(c(1L, 2L, 11L, 10L, 4L, 
3L, 6L, 12L, 8L, 9L, 5L, 7L), .Label = c("a", "b", "g", "higf;hdugd", 
"hue", "hyu", "if", "inja", "jud", "ufhy", "uhgf;ffugf", "uhgs"
), class = "factor")), .Names = c("Ids", "string"), class = "data.frame", row.names = c(NA, 
-12L))

我尝试使用merge

进行操作
df1 <- mydf[,1:2] 
df2 <- mydf[,3:4]
df3 = merge(df1, df2, by.x=c("IDs", "string"))

因为不平等而给我一个错误

我也尝试使用这里给出的方法 How to join (merge) data frames (inner, outer, left, right)?并未解决我的问题

我的输入是这样的

IDs string1        IDs  string2
11  b              4    a
16  uhgf;ffugf     11   b
19  ufhy           16   uhgf;ffugf
21  g              19   ufhy
22  hyu            20   higf;hdugd
24  uhgs           22   hyu
42  jud            29   inja
43  hue     
51  if  

,输出看起来像这样

Ids string
4   a
11  b
16  uhgf;ffugf
19  ufhy
20  higf;hdugd
21  g
22  hyu
24  uhgs
29  inja
42  jud
43  hue
51  if  

e.g。 11,16等重复两次,所以我们只需要它们一次

1 个答案:

答案 0 :(得分:2)

我们可以执行rbind并删除duplicated元素

library(data.table)
setnames(rbindlist(list(mydf[3:4], mydf[1:2]))[!is.na(IDs.1)&!duplicated(IDs.1)], 
             c("Ids", "string"))[order(Ids)]
#    Ids     string
# 1:   4          a
# 2:  11          b
# 3:  16 uhgf;ffugf
# 4:  19       ufhy
# 5:  20 higf;hdugd
# 6:  21          g
# 7:  22        hyu
# 8:  24       uhgs
# 9:  29       inja
#10:  42        jud
#11:  43        hue
#12:  51         if

另一个选项是来自melt的{​​{1}}(转换为'long'格式),可以采用多个data.table模式,然后移除measure'ID'和duplicated使用'Ids'。

order