我正在尝试使用euclidean distance
来实现scipy.spatial.distance
,我之前通常会这样写。{/ p>
from math import sqrt
critics = {'Lisa Rose': {'Lady in the Water': 2.5, 'Snakes on a Plane': 3.5,
'Just My Luck': 3.0, 'Superman Returns': 3.5, 'You, Me and Dupree': 2.5,
'The Night Listener': 3.0},
'Gene Seymour': {'Lady in the Water': 3.0, 'Snakes on a Plane': 3.5,
'Just My Luck': 1.5, 'Superman Returns': 5.0, 'The Night Listener': 3.0,
'You, Me and Dupree': 3.5},
'Michael Phillips': {'Lady in the Water': 2.5, 'Snakes on a Plane': 3.0,
'Superman Returns': 3.5, 'The Night Listener': 4.0},
'Claudia Puig': {'Snakes on a Plane': 3.5, 'Just My Luck': 3.0,
'The Night Listener': 4.5, 'Superman Returns': 4.0,
'You, Me and Dupree': 2.5},
'Mick LaSalle': {'Lady in the Water': 3.0, 'Snakes on a Plane': 4.0,
'Just My Luck': 2.0, 'Superman Returns': 3.0, 'The Night Listener': 3.0,
'You, Me and Dupree': 2.0},
'Jack Matthews': {'Lady in the Water': 3.0, 'Snakes on a Plane': 4.0,
'The Night Listener': 3.0, 'Superman Returns': 5.0, 'You, Me and Dupree': 3.5},
'Toby': {'Snakes on a Plane': 4.5, 'You, Me and Dupree': 1.0, 'Superman Returns': 4.0}}
def sim_distance(preference, person1, person2):
si = {}
for item in preference[person1]:
if item in preference[person2]:
si[item] = 1
if len(si) == 0: return 0
sum_of_scores = sum([pow(preference[person1][item] - preference[person2][item], 2)
for item in preference[person1] if item in preference[person2]])
return 1 / (1 + sum_of_scores)
a = sim_distance(critics, 'Lisa Rose','Mick LaSalle')
print(a) #0.333
当前的实现工作正常,但是当尝试使用scipy
模块时,我无法理解应该为其提供什么类型的输入。这就是我的尝试。
from scipy.spatial.distance import euclidean
a = euclidean(critics['Lisa Rose'], critics['Mick LaSalle'])
print(a)
回溯
Traceback (most recent call last):
File "C:/Users/Ajay/PycharmProjects/SO/new.py", line 22, in <module>
a = euclidean(critics['Lisa Rose'], critics['Mick LaSalle'])
File "C:\Python33\lib\site-packages\scipy\spatial\distance.py", line 224, in euclidean
dist = norm(u - v)
TypeError: unsupported operand type(s) for -: 'dict' and 'dict'
当提到euclidean
实现时,似乎输入应该是tuples
,但我无法理解如何处理它。
def euclidean(u, v):
"""
Computes the Euclidean distance between two 1-D arrays.
The Euclidean distance between 1-D arrays `u` and `v`, is defined as
.. math::
{||u-v||}_2
Parameters
----------
u : (N,) array_like
Input array.
v : (N,) array_like
Input array.
Returns
-------
euclidean : double
The Euclidean distance between vectors `u` and `v`.
"""
u = _validate_vector(u)
v = _validate_vector(v)
dist = norm(u - v)
return dist
请赐教。
答案 0 :(得分:1)
欧几里德距离定义为两个向量之间差异的L2范数,您可以在dist = norm(u - v)
函数中看到euclidean
。您的critics['Lisa Rose']
和critics['Mick LaSalle']
是词典,并且没有为词典数据类型定义-
(减法)操作。此外,norm
是为类似数组的数据类型定义的。
因此,如果您确实需要使用scipy.spatial.distance.euclidean
作为案例,则需要为critics
创建一个类,并且在您的班级中,您需要通过定义{来重载-
运算符{1}}方法,返回类似数组的数据类型。
答案 1 :(得分:0)
我用丑陋的代码制作了一个很好的API:
http://vectordict.readthedocs.org/en/latest/vector.html#metrics
重要提示:我建议不要使用此代码(我正确地重写它),只是看看它是如何工作的,并且可能尊重API但不使用代码。
通过覆盖+ / - / * / /“线性代数方式,您可能会对如何使用dict返回”数学“以及如何实现它以及它如何使生活更轻松感兴趣。
这是L2规范的实现:sqrt(self.dot(self))
https://github.com/jul/ADictAdd_iction/blob/master/vector_dict/VectorDict.py#L972
我提倡使用与对象上的线性代数定义一致的API,以便更容易理解您所阅读的内容。
from vector_dict.VectorDict import cos
from vector_dict.VectorDict import convert_tree, VectorDict
crit = {'Lisa Rose': {'Lady in the Water': 2.5, 'Snakes on a Plane': 3.5,
'Just My Luck': 3.0, 'Superman Returns': 3.5, 'You, Me and Dupree': 2.5,
'The Night Listener': 3.0},
'Gene Seymour': {'Lady in the Water': 3.0, 'Snakes on a Plane': 3.5,
'Just My Luck': 1.5, 'Superman Returns': 5.0, 'The Night Listener': 3.0,
'You, Me and Dupree': 3.5},
'Michael Phillips': {'Lady in the Water': 2.5, 'Snakes on a Plane': 3.0,
'Superman Returns': 3.5, 'The Night Listener': 4.0},
'Claudia Puig': {'Snakes on a Plane': 3.5, 'Just My Luck': 3.0,
'The Night Listener': 4.5, 'Superman Returns': 4.0,
'You, Me and Dupree': 2.5},
'Mick LaSalle': {'Lady in the Water': 3.0, 'Snakes on a Plane': 4.0,
'Just My Luck': 2.0, 'Superman Returns': 3.0, 'The Night Listener': 3.0,
'You, Me and Dupree': 2.0},
'Jack Matthews': {'Lady in the Water': 3.0, 'Snakes on a Plane': 4.0,
'The Night Listener': 3.0, 'Superman Returns': 5.0, 'You, Me and Dupree': 3.5},
'Toby': {'Snakes on a Plane': 4.5, 'You, Me and Dupree': 1.0, 'Superman Returns': 4.0}}
dd = convert_tree(crit)
print "cos"
print cos(dd['Gene Seymour'], dd['Toby'])
# 0.770024275094
print "L2 distance"
print dd['Gene Seymour'].norm()
# 8.35164654425
print "jaccard similarities"
print dd['Gene Seymour'].jaccard( dd['Toby'])
# 0.579335793358
PS我猜想如果你正在制作规范演算,那就是进行比较,我跳到你想要进行相似性测量的结论。