R:pmatch执行更艰巨的任务

时间:2011-07-06 08:34:50

标签: r match

谢谢@nullglob,

我试图再次运行它,但我的输出是不同的。如果我滥用你的代码,你能介意教我吗?对不起,我可能误解了它的工作方式。我希望你不介意给我一些建议。

 df1 <- data.frame(
    A=c("x01","x02","y03","z02","x04", "x33", "z03"),
    B=c("A01BB01","A02BB02","C02AA05","B04CC10","C01GX02", "yyy", "zzz"))




 df2 <- data.frame(
    X=c("a","b","c","d","e", "f"),
    Y=c("A01BB","A02","C02A","B04","C01GX", "xxx"))





with(c(df1,df2),{
   i <- pmatch(Y,B)
   iunmatched <- which(is.na(i))
   nunmatched <- length(iunmatched)
   nexcess <- length(B) - length(X)
   data.frame(A = c(A,rep(NA,nunmatched)),
              B = c(B,rep(NA,nunmatched)),
              X = c(X[i],rep(NA,nexcess),X[iunmatched]),
              Y = c(Y[i],rep(NA,nexcess),Y[iunmatched]))  })

       A  B  X  Y
    1  1  1  1  1
    2  2  2  2  2
    3  5  5  3  5
    4  6  3  4  3
    5  3  4  5  4
    6  4  6 NA NA
    7  7  7 NA NA
    8 NA NA  6  6

====================== ORIGINAL问题=====

感谢我之前提问的答案。 (http://stackoverflow.com/q/6592214/602276)

基于这个答案,我想做一个更艰巨的任务。

df1 <- data.frame(
  A=c("x01","x02","y03","z02","x04", "x33", "z03")
  B=c("A01BB01","A02BB02","C02AA05","B04CC10","C01GX02", "yyy", "zzz")
)

    A       B
1 x01 A01BB01
2 x02 A02BB02
3 y03 C02AA05
4 z02 B04CC10
5 x04 C01GX02
6 x33     yyy
7 z03     zzz

我的df2修改如下:

df2 <- data.frame(
  X=c("a","b","c","d","e", "f"),
  Y=c("A01BB","A02","C02A","B04","C01GX", "xxx")
)

  X     Y
1 a A01BB
2 b   A02
3 c  C02A
4 d   B04
5 e C01GX
6 f   xxx

困难是由于df1和df2有不同的行数,我不能在正确的开头做cbind

Morover,df1和df2之间存在一些不匹配,相应的行应相应地产生NA。

预期输出如下:

   A       B   X     Y
1 x01 A01BB01   a A01BB
2 x02 A02BB02   b   A02
3 y03 C02AA05   c  C02A
4 z02 B04CC10   d   B04
5 x04 C01GX02   e C01GX
6 x33     yyy   NA  NA
7 z03     zzz   NA  NA
7 NA      NA    f   xxx
你能介意教我如何用R做吗?非常感谢。

1 个答案:

答案 0 :(得分:0)

这不是一个优雅的解决方案,但似乎可以解决这个问题:

with(c(df1,df2),{
  i <- pmatch(Y,B)
  iunmatched <- which(is.na(i))
  nunmatched <- length(iunmatched)
  nexcess <- length(B) - length(X)
  data.frame(A = c(A,rep(NA,nunmatched)),
             B = c(B,rep(NA,nunmatched)),
             X = c(X[i],rep(NA,nexcess),X[iunmatched]),
             Y = c(Y[i],rep(NA,nexcess),Y[iunmatched]))
})

输出应为:

     A       B    X     Y
1  x01 A01BB01    a A01BB
2  x02 A02BB02    b   A02
3  y03 C02AA05    c  C02A
4  z02 B04CC10    d   B04
5  x04 C01GX02    e C01GX
6  x33     yyy <NA>  <NA>
7  z03     zzz <NA>  <NA>
8 <NA>    <NA>    f   xxx