我有一个包含数千行的csv文件。我只想检索与特定单词有些相似的行。在这种情况下,我希望赶上第1、2和4行。
有什么想法要实现吗?
import csv
a='Microsoft'
f = open("testing.csv")
reader = csv.reader(f, delimiter='\n')
for row in reader:
if a in row[0]:
print row[0]
testing.csv
I like very much the Microsoft products
Me too, I like Micrsoft
I prefer Apple products
microfte here
答案 0 :(得分:1)
fuzzywuzzy
库适用于此。给定您的测试数据和预期结果,我认为大小写无关紧要,所以我同时将要比较的单词和测试数据都大写了:
from fuzzywuzzy import fuzz
import csv
word = 'Microsoft'.upper()
f = open('testing.csv')
reader = csv.reader(f, delimiter='\n')
for row in reader:
a = row[0].split(' ')
if max([fuzz.ratio(word, x.upper()) for x in a]) > 80:
print(row[0])
结果:
$ python test.py I like very much the Microsoft products Me too, I like Micrsoft microfte here