我试图通过彼此之间的匹配对两个元组列表进行排序。元组包含从体育博彩网站抓取的数据。我写了一些与每个列表中的条目匹配的代码,并将它们附加到新列表中。我遇到的问题是找到可以基于不完全匹配进行搜索的排序功能,例如,名称中可能会有多余的空格,或者可能是缩短了的团队名称,例如sportsbet_list中的“ Nth Queensland Cowboys”,而不是list_finale中的北昆士兰牛仔。请参阅以下列表:
list_finale = [[('Canterbury Bulldogs ', '3.25'), ('South Sydney Rabbitohs', '1.34')], [('Parramatta Eels ', '1.79'), ('Wests Tigers', '2.02')], [('Melbourne Storm ', '1.90'), ('Sydney Roosters', '1.90')], [('Gold Coast Titans ', '1.86'), ('Newcastle Knights', '1.94')], [('New Zealand Warriors ', '1.39'), ('North Queensland Cowboys', '2.95')], [('Cronulla Sharks ', '1.68'), ('Penrith Panthers', '2.18')], [('St. George Illawarra Dragons ', '1.45'), ('Manly Sea Eagles', '2.74')], [('Canberra Raiders ', '1.63'), ('Brisbane Broncos', '2.26')]]
sportsbet_list = [[('Cronulla Sharks', '1.64'), ('Penrith Panthers', '2.27')], [('Canterbury Bulldogs', '3.30'), ('South Sydney Rabbitohs', '1.33')], [('Melbourne Storm', '1.90'), ('Sydney Roosters', '1.90')], [('New Zealand Warriors', '1.40'), ('Nth Queensland Cowboys', '2.90')], [('St George Illawarra Dragons', '1.45'), ('Manly Sea Eagles', '2.75')], [('Gold Coast Titans', '1.85'), ('Newcastle Knights', '1.95')], [('Canberra Raiders', '1.60'), ('Brisbane Broncos', '2.30')], [('Parramatta Eels', '1.90'), ('Wests Tigers', '1.90')], [('Sydney Roosters', '1.35'), ('St George Illawarra Dragons', '3.20')], [('Melbourne Storm', '1.25'), ('New Zealand Warriors', '4.00')], [('Canterbury Bulldogs', '1.56'), ('Nth Queensland Cowboys', '2.40')], [('Penrith Panthers', '2.20'), ('South Sydney Rabbitohs', '1.67')], [('Wests Tigers', '1.67'), ('Gold Coast Titans', '2.20')], [('Brisbane Broncos', '1.70'), ('Cronulla Sharks', '2.15')], [('Manly Sea Eagles', '1.85'), ('Canberra Raiders', '1.95')], [('Newcastle Knights', '1.80'), ('Parramatta Eels', '2.00')]]
下面列出了我用于对这些列表进行排序的当前代码:
list_n = []
list_n1 = []
for a in sportsbet_list:
for b in list_finale:
if b[0][0] == a[0][0] and b[1][0] == a[1][0]:
list_n.append(a)
list_n1.append(b)
这有效,但仅在团队名称完全相同时使用。
基本上,我需要一个接受b[0][0] == a[0][0] and b[1][0] == a[1][0]
的匹配函数:如果它们相似度为85%或类似的值。
我对编码非常陌生,因此可以提出任何建议或帮助。
答案 0 :(得分:1)
您可以采取几种路线。第一种可能是更严格的匹配解决方案,而第二种是模糊的。
此解决方案可能无法满足您的需要,但基本上就是这样:您需要使这些名称尽可能接近所需的状态。以下示例有望说明您可能会做什么:
team_bet_list = [('Canterbury Bulldogs ', '3.25'), ('South Sydney Rabbitohs', '1.34')]
def normalize_team(item):
substitutions = {
'north': 'nth',
'south': 'sth', # etc
}
words = [word.lower() for word in item[0].strip()]
return (words, item[1]) # returning a new tuple -- you might even want to return the original non-normalized value, if that's important to you
normalized_values = [normalize(pair) for pair in team_bet_list]
# Now you should be able to sort these, but it'll take some experimentation to find the best normalization approach