我有2个带有项目名称的excel文件。我想比较项目,但唯一远程相似的列是名称列,它也有不同的名称格式,如
KIDS-Piano 为 kids piano
黄油凝胶100mg 作为 Butter-Gel-100MG
我知道它不能100%准确,所以我会要求操作代码的人进行最终验证,但如何显示最接近的匹配名称?
答案 0 :(得分:1)
这样做的正确方法是编写正则表达式。
但是下面的vanilla代码也可以解决这个问题:
column_a = ["KIDS-Piano", "Butter Gel 100mg"]
column_b = ["kids piano", "Butter-Gel-100MG"]
new_column_a = []
for i in column_a:
# convert strings into lowercase
a = i.lower()
# replace dashes with spaces
a = a.replace('-', ' ')
new_column_a.append(a)
# do the same for column b
new_column_b = []
for i in column_b:
# convert strings into lowercase
a = i.lower()
# replace dashes with spaces
a = a.replace('-', ' ')
new_column_b.append(a)
as_not_found_in_b = []
for i in new_column_a:
if i not in new_column_b:
as_not_found_in_b.append(i)
bs_not_found_in_a = []
for i in new_column_b:
if i not in new_column_a:
bs_not_found_in_a.append(i)
# find the problematic ones and manually fix them
print(as_not_found_in_b)
print(bs_not_found_in_a)