如何使用difflib在列中查找类似的行?

时间:2017-10-12 14:17:35

标签: python

请注意,我有两个包含公司名称列的CSV文件。使用Python3和pandas,我进行了合并以比较名称:

compara1 = pd.merge(
    dividas_dep, funrural,
    left_on='Nome_Devedor',
    right_on='Razao_Social')

找到七行,列数相等。但是这些文件的公司名称并不总是在某些文件中正确输入。例如:

AGROPECUARIA INDIANA LTDA
AGROPECUARIA INDINA LTDA

AGROTRI AGROPECUARIA TRIANGULO LTDA
AGROTRI AGROPECUARI TRIANGULO LTDA

因此合并在Python中找不到类似的值

然后我使用了difflib:

from difflib import SequenceMatcher

def similar(a, b):
    threshold = 0.8
    return (SequenceMatcher(None, a, b).ratio() > threshold)


for i, row in dividas_dep.iterrows():
    a = (row['Nome_Devedor'])
    for i, row in funrural.iterrows():
        b = (row['Razao_Social'])
        similar(a, b)

处理了大约5分钟但没有返回任何东西。有什么问题?

1 个答案:

答案 0 :(得分:0)

我认为只需要显示结果,我现在意识到:

def similar(a, b):
    threshold = 0.8
    s = SequenceMatcher(None, a, b).ratio() > threshold
    print(s)
    return s


for i, row in dividas_dep.iterrows():
    a = (row['Nome_Devedor'])
    for i, row in funrural.iterrows():
        b = (row['Razao_Social'])
        similar(a, b)
        print(a)
        print(b)
        print("-/-")