比较R中的两个数据帧

时间:2016-12-11 17:05:48

标签: r dataframe

我是R新手,我有2个数据框如下:

df1
T_id U_id  U_code  score  
A_0_1 UHJKI XPOS_hp 134
B_1_3 NBVFR LKJ_mm  543
C_9_0 TRFDA NBV_lp  80
D_9_1 KOIUA TRE_po  212
E_0_1 SDFRQ QAS_np  300
E_0_1 SDKIJ JIT_mx  160
F_0_1 JKOPA TOZ_po  79

df2
T_id U_id  U_code  score
A_0_1 UHJKI XPOS_hp 150
B_1_3 NBVFR LKJ_mm  520
C_9_0 TRFDG NBJ_po  90
D_9_1 KOIUA TRE_po  250
E_0_1 SDFRQ QAS_np  300
E_0_1 SDKIJ JIT_mx  160
F_0_1 LOLPO JUZ_ic  90

我想比较df1和df2的分数,df1中的条目与df2中的T_id, U_id and U_code完全相同,并根据条件(df1$score >df2$score, df1$score=df2$score, df$1score<df2score)将它们分为3组:

df$1score=df2$score
E_0_1 SDFRQ QAS_np  300
E_0_1 SDKIJ JIT_mx  160
df1$score > df2$score
B_1_3 NBVFR LKJ_mm  543
df1$score < df2$score
A_0_1 UHJKI XPOS_hp 150
D_9_1 KOIUA TRE_po  250 

另外,我想存储df1的条目,在df2

中找不到匹配项
No matches
C_9_0 TRFDA NBV_lp  80
F_0_1 JKOPA TOZ_po  79

我尝试了以下R代码

comparison=function(df1,df2)
{
df1_equal_df2=NULL
df1_greater_than_df2=NULL
df1_smaller_than_df2=NULL
no_match=NULL
if(df$T_id==df2$T_id && df1$U_id == df2$U_id && df1$U_code==df2$U_code && df1$score > df2$score)
 {
   df1_greater_than_df2=df$T_id
 }
else if(df$T_id==df2$T_id && df1$U_id == df2$U_id && df1$U_code==df2$U_code && df1$score < df2$score)
 {
   df1_smaller_than_df2=df1$id
 }
else if(df$T_id==df2$T_id && df1$U_id == df2$U_id && df1$U_code==df2$U_code && df1$score = df2$score)
  {
     df1_equal_df2=df$1
  }
else
  {
     no_match=df$1
  }

}

但是上面没有用。如何获得所需的输出。请指导我

1 个答案:

答案 0 :(得分:3)

我们可以使用dplyr执行此操作:

library(dplyr)
res <- df1 %>% left_join(df2, by=c("T_id","U_id","U_code")) %>%
               mutate(comp=ifelse(score.x > score.y,"df1$score > df2$score",ifelse(score.x < score.y,"df1$score < df2$score","df1$score == df2$score"))) %>%
               rename(score=score.x) %>% select(-score.y)
##   T_id  U_id  U_code score                   comp
##1 A_0_1 UHJKI XPOS_hp   134  df1$score < df2$score
##2 B_1_3 NBVFR  LKJ_mm   543  df1$score > df2$score
##3 C_9_0 TRFDA  NBV_lp    80                   <NA>
##4 D_9_1 KOIUA  TRE_po   212  df1$score < df2$score
##5 E_0_1 SDFRQ  QAS_np   300 df1$score == df2$score
##6 E_0_1 SDKIJ  JIT_mx   160 df1$score == df2$score
##7 F_0_1 JKOPA  TOZ_po    79                   <NA>

我们按df1执行df2T_id, U_id, and U_code的左外连接。这将合并两个表,其中score来自df1 score.xscore来自df2 score.y。然后使用mutate创建列comp,表示score.x是否大于,小于或等于score.y。最后,我们将score.x列重命名为score并删除score.y列,以使结果更清晰。

使用base-R的等效实现是:

res <- merge(df1,df2,by=c("T_id","U_id","U_code"), all.x=TRUE)
res$comp <- ifelse(res$score.x > res$score.y,"df1$score > df2$score",ifelse(res$score.x < res$score.y,"df1$score < df2$score","df1$score == df2$score"))
res <- res[,c(1:4,6)]
colnames(res) <- sub("score.x","score",colnames(res))

给出了相同的结果。如果您希望按split comp split(res[,-5],res$comp) ##$`df1$score < df2$score` ## T_id U_id U_code score ##1 A_0_1 UHJKI XPOS_hp 134 ##4 D_9_1 KOIUA TRE_po 212 ## ##$`df1$score == df2$score` ## T_id U_id U_code score ##5 E_0_1 SDFRQ QAS_np 300 ##6 E_0_1 SDKIJ JIT_mx 160 ## ##$`df1$score > df2$score` ## T_id U_id U_code score ##2 B_1_3 NBVFR LKJ_mm 543 生成此数据框。{/ p>

float:right