模糊匹配排名

时间:2015-07-14 01:13:07

标签: python fuzzy-search fuzzy-comparison

我模糊地匹配了一个电影标题列表,并将它们编辑成每个比较的另一个列表以及匹配值:

>>> fuzzy_matches
[(['White Warrior (Alpha Video)'], ['White Warrior (Alpha Video)'], 100), (['White Warrior (Alpha Video)'], ['White Warrior (Digiview Entertainment)'], 63), (['White Warrior (Alpha Video)'], ['White Warrior (Platinum)'], 78), (['White Warrior (Alpha Video)'], ['White Warrior (Platinum) / David And Goliath'], 63), (['White Warrior (Alpha Video)'], ['White Warrior (Platinum) / Duel Of Champions'], 61)]...etc

我想为每个标题添加匹配值,以便我得到如下输出:

>>>([White Warrior (Alpha Video)], 248),
['White Warrior 2 (Digiview Entertainment)'], 390),
etc...

我尝试了几种利用切片的实现,但它很难看。

(不是我的确切代码,但这是丑陋的):

for x in range(len(fuzzed)):
    for y in fuzzed(len(fuzzed)):

big_dict[fuzzy_matches[55][0][0]]=fuzzy_matches[55][2] + fuzzy_matches[56][3]...

什么是更有效的方法来实现这一目标?

1 个答案:

答案 0 :(得分:1)

你可以使用dict来存储你想要的结果,然后如果你想要一个元组列表,你可以使用dict.items()(Python 3.x)来获得它。

示例 -

>>> fuzzy_matches = [(['White Warrior (Alpha Video)'], ['White Warrior (Alpha Video)'], 100), (['White Warrior (Alpha Video)'], ['White Warrior (Digiview Entertainment)'], 63), (['White Warrior (Alpha Video)'], ['White Warrior (Platinum)'], 78), (['White Warrior (Alpha Video)'], ['White Warrior (Platinum) / David And Goliath'], 63), (['White Warrior (Alpha Video)'], ['White Warrior (Platinum) / Du
el Of Champions'], 61)]
>>>
>>> fuzzy_dict = {}
>>> for i in fuzzy_matches:
...     if i[0][0] not in fuzzy_dict:
...             fuzzy_dict[i[0][0]] = 0
...     fuzzy_dict[i[0][0]] += i[2]
...
>>> fuzzy_dict
{'White Warrior (Alpha Video)': 365}
>>> list(fuzzy_dict.items())
[('White Warrior (Alpha Video)', 365)]

如果您使用的是Python 2.x,则最后不需要list(...)