有这样的清单
['Jack Matthews', 'Mick LaSalle', 'Claudia Puig', 'Lisa Rose', 'Toby', 'Gene Seymour']
如何制作一个列表,其中将存储上面列表中所有可能的项目组合,例如
[('Jack Matthews', 'Toby'), ('Jack Matthews', 'Claudia Puig'), ('Jack Matthews', 'Lisa Rose')] # and so on
我需要上面的元组来实现这个功能
def euclidean_distance(preferences_dict, person_1, person_2):
shared_items = {}
for item in preferences_dict[person_1]:
if item in preferences_dict[person_2]:
shared_items[item] = 1
if not len(shared_items):
return
sum_of_squares = sqrt(sum([pow(preferences_dict[person_1][item] - preferences_dict[person_2][item], 2) for item in preferences_dict[person_1] if item in preferences_dict[person_2]]))
return 1/(1+sum_of_squares)
和此数据集
critics={'Lisa Rose': {'Lady in the Water': 2.5, 'Snakes on a Plane': 3.5,
'Just My Luck': 3.0, 'Superman Returns': 3.5, 'You, Me and Dupree': 2.5,
'The Night Listener': 3.0},
'Gene Seymour': {'Lady in the Water': 3.0, 'Snakes on a Plane': 3.5,
'Just My Luck': 1.5, 'Superman Returns': 5.0, 'The Night Listener': 3.0,
'You, Me and Dupree': 3.5},
'Michael Phillips': {'Lady in the Water': 2.5, 'Snakes on a Plane': 3.0,
'Superman Returns': 3.5, 'The Night Listener': 4.0},
'Claudia Puig': {'Snakes on a Plane': 3.5, 'Just My Luck': 3.0,
'The Night Listener': 4.5, 'Superman Returns': 4.0,
'You, Me and Dupree': 2.5},
'Mick LaSalle': {'Lady in the Water': 3.0, 'Snakes on a Plane': 4.0,
'Just My Luck': 2.0, 'Superman Returns': 3.0, 'The Night Listener': 3.0,
'You, Me and Dupree': 2.0},
'Jack Matthews': {'Lady in the Water': 3.0, 'Snakes on a Plane': 4.5,
'The Night Listener': 4.5, 'Superman Returns': 4.0, 'You, Me and Dupree': 1.0},
'Toby': {'Snakes on a Plane':4.5,'You, Me and Dupree':1.0,'Superman Returns':4.0}}
我想计算每部电影中两位评论家的欧几里德距离。 对于每对评论家来说,计算这个值的最佳方法是什么,不包括重复 我想到了这个
names = dict([(critic, critics.keys()) for critic in critics.keys()])
for critic in names.keys():
if critic in names[critic]:
names[critic].remove(critic)
actual_distance = []
for base_critic in names.keys():
for critic in names[base_critic]:
actual_distance.append(euclidean_distance(critics, base_critic, critic))
此代码的问题在于它具有重复值,因为名称['Jack Matthews']具有值'Toby',反之亦然
答案 0 :(得分:1)
>>> import itertools
>>> names = ['Jack Matthews', 'Mick LaSalle', 'Claudia Puig', 'Lisa Rose', 'Toby', 'Gene Seymour']
>>> combos = itertools.combinations(names, 2)
>>> for name1, name2 in combos:
... print(name1, name2)
...
('Jack Matthews', 'Mick LaSalle')
('Jack Matthews', 'Claudia Puig')
('Jack Matthews', 'Lisa Rose')
('Jack Matthews', 'Toby')
('Jack Matthews', 'Gene Seymour')
('Mick LaSalle', 'Claudia Puig')
('Mick LaSalle', 'Lisa Rose')
('Mick LaSalle', 'Toby')
('Mick LaSalle', 'Gene Seymour')
('Claudia Puig', 'Lisa Rose')
('Claudia Puig', 'Toby')
('Claudia Puig', 'Gene Seymour')
('Lisa Rose', 'Toby')
('Lisa Rose', 'Gene Seymour')
('Toby', 'Gene Seymour')
答案 1 :(得分:1)
更新:现在,当您更新问题时,情况会有所不同。这是一个使用pandas
和numpy
拼凑在一起的快速代码段(为了简单起见,我们将缺少的评分替换为零):
import numpy as np
importport pandas as pd
from itertools import combinations
df = pd.DataFrame(critics).T.fillna(0)
distances = []
for critic1, critic2 in combinations(df.index, 2):
ratings1 = df.ix[critic1].values
ratings2 = df.ix[critic2].values
dist = np.sqrt(np.sum(ratings1 - ratings2) ** 2) # euclidian distance
distances.append((dist, critic1, critic2))
pd.DataFrame(distances, columns=['distance', 'critic1', 'critic2']).sort('distance', ascending=False).head(5)
所以你有它。 Gene Seymour和Toby强烈不同意他们的评级。