Question

我正在尝试使用FuzzyWuzzy来纠正文本中拼写错误的名称。但是我无法使process.extract和process.extractOne按照我期望的方式运行。

from fuzzywuzzy import process

the_text = 'VICTOR HUGO e MARIANA VEIGA'
search_term = 'VEYGA'

the_text = the_text.split()
found_word = process.extract(search_term, the_text)

print(found_word)

这导致：

[('e', 90), ('VEIGA', 80), ('HUGO', 22), ('VICTOR', 18), ('MARIANA', 17)]

如何让FuzzyWuzzy正确识别'VEIGA'作为正确答案？

Answer 1

您可以尝试使用：fuzz.token_set_ratio或fuzz.token_sort_ratio 答案在这里：When to use which fuzz function to compare 2 strings给出了很好的解释。

完成这里是一些代码：

from fuzzywuzzy import process
from fuzzywuzzy import fuzz

the_text = 'VICTOR HUGO e MARIANA VEIGA'
search_term = 'VEYGA'

the_text = the_text.split()
found_word = process.extract(search_term, the_text, scorer=fuzz.token_sort_ratio)

print(found_word)

输出：

[（'VEIGA'，80），（'e'，33），（'HUGO'，22），（'VICTOR'，18），（'MARIANA'，17）]

FuzzyWuzzy提取中的奇怪行为

1 个答案: