我正在尝试使用Fuzzywuzzy库通过fuzz.ratio
函数获取2个数据集中的字符串之间的相似性得分。
尽管我不断出现以下错误:
File "title_matching.py", line 29, in <module>
match = match_title(title, all_titles_list, 75)
File "title_matching.py", line 12, in match_title
score = fuzz.ratio(title, title2)
File "/usr/local/lib/python3.7/site-packages/fuzzywuzzy/utils.py", line 38, in decorator
return func(*args, **kwargs)
File "/usr/local/lib/python3.7/site-packages/fuzzywuzzy/utils.py", line 29, in decorator
return func(*args, **kwargs)
File "/usr/local/lib/python3.7/site-packages/fuzzywuzzy/utils.py", line 45, in decorator
if len(args[0]) == 0 or len(args[1]) == 0:
TypeError: object of type 'float' has no len()
下面是我使用库函数的模块:
def match_title(title, list_titles, min_score=0):
# -1 score incase we don't get any matches
max_score = -1
# Returning empty name for no match as well
max_name = ""
# Iternating over all names in the other
for title2 in list_titles:
#Finding fuzzy match score
score = fuzz.ratio(title, title2)
# Checking if we are above our threshold and have a better score
if (score > min_score) & (score > max_score):
max_name = title2
max_score = score
return (max_name, max_score)
我已经通过打印检查title和list_titles的值,它们分别是字符串和字符串列表。 我不知道为什么会发生这种情况或如何解决它,因为该错误正在库文件中生成。
答案 0 :(得分:1)
score = fuzz.ratio(title, title2)
title
或title2
都是浮点数,而不是字符串。
from fuzzywuzzy import fuzz
print(fuzz.ratio('1', '2'))
# 0
print(fuzz.ratio(1.0, '2'))
Traceback (most recent call last):
File "main.py", line 3, in <module>
print(fuzz.ratio(1.0, '2'))
File "C:\Python37\lib\site-packages\fuzzywuzzy\utils.py", line 38, in decorator
return func(*args, **kwargs)
File "C:\Python37\lib\site-packages\fuzzywuzzy\utils.py", line 29, in decorator
return func(*args, **kwargs)
File "C:\Python37\lib\site-packages\fuzzywuzzy\utils.py", line 45, in decorator
if len(args[0]) == 0 or len(args[1]) == 0:
TypeError: object of type 'float' has no len()