通过单词相似度找到csv行

时间:2018-12-21 15:00:58

标签: python algorithm python-2.7 similarity levenshtein-distance

我有一个包含数千行的csv文件。我只想检索与特定单词有些相似的行。在这种情况下,我希望赶上第1、2和4行。

有什么想法要实现吗?

import csv
a='Microsoft'
f = open("testing.csv")
reader = csv.reader(f, delimiter='\n')

for row in reader:
    if a in row[0]:
        print row[0]

testing.csv

I like very much the Microsoft products
Me too, I like Micrsoft
I prefer Apple products
microfte here

1 个答案:

答案 0 :(得分:1)

fuzzywuzzy库适用于此。给定您的测试数据和预期结果,我认为大小写无关紧要,所以我同时将要比较的单词和测试数据都大写了:

from fuzzywuzzy import fuzz
import csv

word = 'Microsoft'.upper()

f = open('testing.csv')
reader = csv.reader(f, delimiter='\n')

for row in reader:
    a = row[0].split(' ')
    if max([fuzz.ratio(word, x.upper()) for x in a]) > 80:
        print(row[0])

结果:

$ python test.py
I like very much the Microsoft products
Me too, I like Micrsoft
microfte here