我有两个数据帧,分别是df_1和df_2。对于df_1中的每个键,我想找到与df_2中的Form_1匹配的最佳Form_2。
IF
Form_1存在于df_2中,然后进行完全匹配-例如,key = B,Form_1 =平板电脑,Form_2 =平板电脑
否则采用最短的长度匹配-例如,key = D,Form_1 = patch,ER和Form_2 = patch。这是与补丁ER匹配的最短字长。
如果它们与Form_1的匹配项超过两个,则两者都取。例如,key = G在df_2 Form_2中有两个匹配项
最后,如果没有匹配项,则默认为NA。
df_2=data.frame(Form_2=c("suspension","for suspension","tablet","tablet,tablet","patch","patch,IR","tablet,ER","Injection","Injection,Solution","liquid"))
df_1=data.frame(
key=c("A","B","C","D","E","F","G","H"),
Form_1=c("suspension","tablet","tablet,ER","patch,ER","tablet","Injection,Solution","liquid Injection",'see attachment'))
这是我的输出应为:
df_out=data.frame(
key=c("A","B","C","D","E","F","G","G","H"),
Form_1=c("suspension","tablet","tablet,ER","patch,ER","tablet","Injection,Solution","liquid Injection","liquid Injection",'see attachment'),
Form_2=c("suspension","tablet","tablet,ER","patch","tablet","Injection,Solution","Liquid","Injection",NA)
)