将数据框中的每一行与另一个数据框中的多行进行比较,并获得结果

时间:2019-12-18 06:49:56

标签: r dataframe

我有2个数据集df1和df2。

df1
c1  match   c3      c4
AA1 AB      cat     dog
AA1 CD      dfs     abd
AA1 EF      js      hn
AA1 GH      bsk     jtd
AA2 AB      cat     mouse
AA2 CD      adb     mop
AA2 EF      powas   qwert
AA2 GH      sms     mms
AA3 AB      i       j
AA3 CD      fgh     ejk
AA3 EF      mib     loi
AA3 GH      revit   roger

df2
match   d2      result
AB      cat     friendly
AB      mouse   enemy
CD      dfs     r1
CD      adb     r1
CD      fgh     r2
CD      ejk     r3
EF      mib     some_result
GH      sms     sent
GH      mms     sent
IJ      xxx     yyy
KL      crt     zzz
KL      rrr     qqq

我想通过“ match”列匹配匹配df1和df2,并在df1中添加2个新列“ result_c1”和“ result_c2”。通过首先匹配匹配列,然后将df1中的c3匹配到df2中的d2,result_c1从df2中获得相应的结果。 result_c2通过首先匹配match列,然后将df1中的c4匹配到df2中的d2,从df2中获得相应的结果。如果没有匹配项,则返回“ no_match”。有没有一种有效的方法可以做到这一点?

result
c1  match   c3      c4      result_c1   result_c2   
AA1 AB      cat     dog     friendly    no_match    
AA1 CD      dfs     adb     r1          r1          
AA1 EF      js      hn      no_match    no_match    
AA1 GH      bsk     jtd     no_match    no_match    
AA2 AB      cat     mouse   friendly    enemy       
AA2 CD      adb     mop     r1          no_match    
AA2 EF      powas   qwert   no_match    no_match    
AA2 GH      sms     mms     sent        sent        
AA3 AB      i       j       no_match    no_match    
AA3 CD      fgh     ejk     r2          r3          
AA3 EF      mib     loi     some_result no_match    
AA3 GH      revit   roger   no_match    no_match    

数据附在下面:

df1 <- data.frame(list(c1 = c("AA1", "AA1", "AA1", "AA1", "AA2", "AA2", "AA2", "AA2",
                      "AA3", "AA3", "AA3", "AA3"), match = c("AB", "CD", "EF", "GH", 
                                                             "AB", "CD", "EF", "GH", 
                                                             "AB", "CD", "EF", "GH"),
                      c3 = c("cat", "dfs", "js", "bsk", "cat", "adb", "powas", "sms", "i",
                      "fgh", "mib", "revit"), c4 = c("dog", "abd", "hn", "jtd", "mouse",
                                                     "mop", "qwert", "mms", "j", "ejk", "loi", "roger")))

df2 <- data.frame(list(match = c("AB", "AB", "CD", "CD", "CD", "CD", "EF", "GH", "GH", "IJ", "KL", "KL"), 
                       d2 = c("cat", "mouse", "dfs", "adb", "fgh", "ejk", "mib", "sms", "mms", "xxx", "crt", "rrr"),
                       result = c("friendly", "enemy", "r1", "r1", "r2", "r3", "some_result", "sent", "sent", "yyy", "zzz", "qqq")))

谢谢。

2 个答案:

答案 0 :(得分:1)

使用 function* cartesian(obj, key, ...keys) { if(!key) { yield obj; return; } const { [key + "s"]: entries, ...rest } = obj; for(const entry of (entries.length ? entries : [undefined])) { yield* cartesian({ [key]: entry, ...rest }, ...keys); } } myArray.flatMap(it => cartesian(it, "version", "target")) 使用自定义函数的一种方式

dplyr

答案 1 :(得分:1)

这是base R的解决方案:

df1$result_c1 = with(df1,ifelse(is.na(match(paste(match,c3),with(df2,paste(match,d2)))),
                                "no match",
                                as.character(df2$result[match(paste(match,c3),with(df2,paste(match,d2)))])))
df1$result_c2 = with(df1,ifelse(is.na(match(paste(match,c4),with(df2,paste(match,d2)))),
                                "no match",
                                as.character(df2$result[match(paste(match,c4),with(df2,paste(match,d2)))])))

如此

> df1
    c1 match    c3    c4   result_c1 result_c2
1  AA1    AB   cat   dog    friendly  no match
2  AA1    CD   dfs   abd          r1        r1
3  AA1    EF    js    hn    no match  no match
4  AA1    GH   bsk   jtd    no match  no match
5  AA2    AB   cat mouse    friendly     enemy
6  AA2    CD   adb   mop    no match  no match
7  AA2    EF powas qwert    no match  no match
8  AA2    GH   sms   mms        sent      sent
9  AA3    AB     i     j    no match  no match
10 AA3    CD   fgh   ejk          r2        r3
11 AA3    EF   mib   loi some_result  no match
12 AA3    GH revit roger    no match  no match