请注意,我有两个包含公司名称列的CSV文件。使用Python3和pandas,我进行了合并以比较名称:
compara1 = pd.merge(
dividas_dep, funrural,
left_on='Nome_Devedor',
right_on='Razao_Social')
找到七行,列数相等。但是这些文件的公司名称并不总是在某些文件中正确输入。例如:
AGROPECUARIA INDIANA LTDA
AGROPECUARIA INDINA LTDA
AGROTRI AGROPECUARIA TRIANGULO LTDA
AGROTRI AGROPECUARI TRIANGULO LTDA
因此合并在Python中找不到类似的值
然后我使用了difflib:
from difflib import SequenceMatcher
def similar(a, b):
threshold = 0.8
return (SequenceMatcher(None, a, b).ratio() > threshold)
for i, row in dividas_dep.iterrows():
a = (row['Nome_Devedor'])
for i, row in funrural.iterrows():
b = (row['Razao_Social'])
similar(a, b)
处理了大约5分钟但没有返回任何东西。有什么问题?
答案 0 :(得分:0)
我认为只需要显示结果,我现在意识到:
def similar(a, b):
threshold = 0.8
s = SequenceMatcher(None, a, b).ratio() > threshold
print(s)
return s
for i, row in dividas_dep.iterrows():
a = (row['Nome_Devedor'])
for i, row in funrural.iterrows():
b = (row['Razao_Social'])
similar(a, b)
print(a)
print(b)
print("-/-")