我是R新手,我有2个数据框如下:
df1
T_id U_id U_code score
A_0_1 UHJKI XPOS_hp 134
B_1_3 NBVFR LKJ_mm 543
C_9_0 TRFDA NBV_lp 80
D_9_1 KOIUA TRE_po 212
E_0_1 SDFRQ QAS_np 300
E_0_1 SDKIJ JIT_mx 160
F_0_1 JKOPA TOZ_po 79
df2
T_id U_id U_code score
A_0_1 UHJKI XPOS_hp 150
B_1_3 NBVFR LKJ_mm 520
C_9_0 TRFDG NBJ_po 90
D_9_1 KOIUA TRE_po 250
E_0_1 SDFRQ QAS_np 300
E_0_1 SDKIJ JIT_mx 160
F_0_1 LOLPO JUZ_ic 90
我想比较df1和df2的分数,df1中的条目与df2中的T_id, U_id and U_code
完全相同,并根据条件(df1$score >df2$score, df1$score=df2$score, df$1score<df2score
)将它们分为3组:
df$1score=df2$score
E_0_1 SDFRQ QAS_np 300
E_0_1 SDKIJ JIT_mx 160
df1$score > df2$score
B_1_3 NBVFR LKJ_mm 543
df1$score < df2$score
A_0_1 UHJKI XPOS_hp 150
D_9_1 KOIUA TRE_po 250
另外,我想存储df1的条目,在df2
中找不到匹配项No matches
C_9_0 TRFDA NBV_lp 80
F_0_1 JKOPA TOZ_po 79
我尝试了以下R代码
comparison=function(df1,df2)
{
df1_equal_df2=NULL
df1_greater_than_df2=NULL
df1_smaller_than_df2=NULL
no_match=NULL
if(df$T_id==df2$T_id && df1$U_id == df2$U_id && df1$U_code==df2$U_code && df1$score > df2$score)
{
df1_greater_than_df2=df$T_id
}
else if(df$T_id==df2$T_id && df1$U_id == df2$U_id && df1$U_code==df2$U_code && df1$score < df2$score)
{
df1_smaller_than_df2=df1$id
}
else if(df$T_id==df2$T_id && df1$U_id == df2$U_id && df1$U_code==df2$U_code && df1$score = df2$score)
{
df1_equal_df2=df$1
}
else
{
no_match=df$1
}
}
但是上面没有用。如何获得所需的输出。请指导我
答案 0 :(得分:3)
我们可以使用dplyr
执行此操作:
library(dplyr)
res <- df1 %>% left_join(df2, by=c("T_id","U_id","U_code")) %>%
mutate(comp=ifelse(score.x > score.y,"df1$score > df2$score",ifelse(score.x < score.y,"df1$score < df2$score","df1$score == df2$score"))) %>%
rename(score=score.x) %>% select(-score.y)
## T_id U_id U_code score comp
##1 A_0_1 UHJKI XPOS_hp 134 df1$score < df2$score
##2 B_1_3 NBVFR LKJ_mm 543 df1$score > df2$score
##3 C_9_0 TRFDA NBV_lp 80 <NA>
##4 D_9_1 KOIUA TRE_po 212 df1$score < df2$score
##5 E_0_1 SDFRQ QAS_np 300 df1$score == df2$score
##6 E_0_1 SDKIJ JIT_mx 160 df1$score == df2$score
##7 F_0_1 JKOPA TOZ_po 79 <NA>
我们按df1
执行df2
和T_id, U_id, and U_code
的左外连接。这将合并两个表,其中score
来自df1
score.x
而score
来自df2
score.y
。然后使用mutate
创建列comp
,表示score.x
是否大于,小于或等于score.y
。最后,我们将score.x
列重命名为score
并删除score.y
列,以使结果更清晰。
使用base-R的等效实现是:
res <- merge(df1,df2,by=c("T_id","U_id","U_code"), all.x=TRUE)
res$comp <- ifelse(res$score.x > res$score.y,"df1$score > df2$score",ifelse(res$score.x < res$score.y,"df1$score < df2$score","df1$score == df2$score"))
res <- res[,c(1:4,6)]
colnames(res) <- sub("score.x","score",colnames(res))
给出了相同的结果。如果您希望按split
comp
split(res[,-5],res$comp)
##$`df1$score < df2$score`
## T_id U_id U_code score
##1 A_0_1 UHJKI XPOS_hp 134
##4 D_9_1 KOIUA TRE_po 212
##
##$`df1$score == df2$score`
## T_id U_id U_code score
##5 E_0_1 SDFRQ QAS_np 300
##6 E_0_1 SDKIJ JIT_mx 160
##
##$`df1$score > df2$score`
## T_id U_id U_code score
##2 B_1_3 NBVFR LKJ_mm 543
生成此数据框。{/ p>
float:right