我想为数据集A中的“product”列与数据集B中的“Description”列进行字符串匹配,并获得相应的开销。如果找不到匹配项,我想要“NA”或“No match”
这两个数据集如下所示:
数据集A:
Country Product
AUS BANLORE 5MG B/90 CP
BNG BANLORE 5MG TAB VNAM X 30
SL BANLORE CAPS 10MGX30 M/T
RIO BANLORE CAPS 10MGX30 M/T
AP CADBURY 10/20MG FT 30PCS WI
BP CADBURY 5/10MG X 10 TABS.
GUJ CADBURY 5/20MG X 10 TABS.
KEL CADBURY 5/10MG FT 30PS ML
PON CHOCO 10MG FT 30 RO
TN CHOCO 20MG FT 30
HYD CHOCO 40MG FT 14
CHN LACTO 2G 20ML LIQ
NAG LACTO 1G 10ML LIQ
NEP LACTO INJ 1000MG
ASM LACTO INJ 2000MG/20ML 10S
,第二个数据集是:
数据集B:
Description Group Cost
BANLORE CAPS 10MG X 30'S Novas 6.34
BANLORE 5MG TAB VNAM X 30 Novas 4.05
BANLORE CAPS 5MG X 10'S Novas 5.29
CADBURY TAB 10MG/10MG X 7'S Cadet 7.77
CADBURY 10MG/10MG X30'S Cadet 4.03
CADBURY 5/20MG FT 7PS Cadet 1.98
CADBURY 5/20MG X 10 TABS Cadet 0.28
CHOCO 20MG FCT BLST PEPSCO 0.18
CHOCO 10MG FT 30 PEPSCO 2.62
LACTO INJ 100MG/5ML 5S star 5.17
LACTO INJ 500MG/25ML 1'S star 8.79
LACTO INJ 2000MG/20ML 10S star 6.44
我的输出如下所示:
Country Product cost
AUS BANLORE 5MG B/90 CP NO MATCH
BNG BANLORE 5MG TAB VNAM X 30 4.05
RIO BANLORE CAPS 10MGX30 M/T NO MATCH
AP CADBURY 5/20MG X 10 TABS 0.28
GUJ LACTO INJ 2000MG/20ML 10S 6.44
KEL CADBURY 5/10MG FT 30PS ML NO MATCH
我尝试使用grep和Levenshtein距离,但无法找到合适的解决方案。
答案 0 :(得分:0)
假设您正在寻找预先匹配,
A$cost <- B$Cost[match(A$Product, B$Description)]