我正在尝试查看如何从5条记录中选择“最可能的”值。我认为fuzzywuzzy
软件包可能会起作用,但想知道即使没有提供搜索字符串,该软件包还是另一个软件包也可以工作。
我尝试了正则表达式,但根本不认为它有用,然后遇到了这个fuzzywuzzy
代码:
https://www.datacamp.com/community/tutorials/fuzzy-string-python
from fuzzywuzzy import process
str2Match = "apple inc"
strOptions = ["Apple Inc.","apple park","apple incorporated","iphone"]
Ratios = process.extract(str2Match,strOptions)
# You can also select the string with the highest matching percentage
highest = process.extractOne(str2Match,strOptions)
结果是:
>> print (Ratios)
[('Apple Inc.', 100), ('apple incorporated', 90), ('apple park', 67), ('iphone', 30)]
>> print(highest)
('Apple Inc.', 100)
上面的结果是有意义的,因为提供了str2Match
的搜索字符串,但是我想知道,代码是否有可能自动生成最佳(最高)值(例如,自动生成某些值)意识到'Apple Inc'
和'apple incorporated'
足够相似时的价值?谢谢