欧氏距离误差:不支持的操作数类型

时间:2014-04-18 17:09:27

标签: python scipy

我正在尝试使用euclidean distance来实现scipy.spatial.distance,我之前通常会这样写。{/ p>

from math import sqrt

critics = {'Lisa Rose': {'Lady in the Water': 2.5, 'Snakes on a Plane': 3.5,
                         'Just My Luck': 3.0, 'Superman Returns': 3.5, 'You, Me and Dupree': 2.5,
                         'The Night Listener': 3.0},
           'Gene Seymour': {'Lady in the Water': 3.0, 'Snakes on a Plane': 3.5,
                            'Just My Luck': 1.5, 'Superman Returns': 5.0, 'The Night Listener': 3.0,
                            'You, Me and Dupree': 3.5},
           'Michael Phillips': {'Lady in the Water': 2.5, 'Snakes on a Plane': 3.0,
                                'Superman Returns': 3.5, 'The Night Listener': 4.0},
           'Claudia Puig': {'Snakes on a Plane': 3.5, 'Just My Luck': 3.0,
                            'The Night Listener': 4.5, 'Superman Returns': 4.0,
                            'You, Me and Dupree': 2.5},
           'Mick LaSalle': {'Lady in the Water': 3.0, 'Snakes on a Plane': 4.0,
                            'Just My Luck': 2.0, 'Superman Returns': 3.0, 'The Night Listener': 3.0,
                            'You, Me and Dupree': 2.0},
           'Jack Matthews': {'Lady in the Water': 3.0, 'Snakes on a Plane': 4.0,
                             'The Night Listener': 3.0, 'Superman Returns': 5.0, 'You, Me and Dupree': 3.5},
           'Toby': {'Snakes on a Plane': 4.5, 'You, Me and Dupree': 1.0, 'Superman Returns': 4.0}}


def sim_distance(preference, person1, person2):
    si = {}
    for item in preference[person1]:
        if item in preference[person2]:
            si[item] = 1

    if len(si) == 0: return 0

    sum_of_scores = sum([pow(preference[person1][item] - preference[person2][item], 2)
                         for item in preference[person1] if item in preference[person2]])

    return 1 / (1 + sum_of_scores)


a = sim_distance(critics, 'Lisa Rose','Mick LaSalle')
print(a) #0.333

当前的实现工作正常,但是当尝试使用scipy模块时,我无法理解应该为其提供什么类型的输入。这就是我的尝试。

from scipy.spatial.distance import euclidean


a = euclidean(critics['Lisa Rose'], critics['Mick LaSalle'])
print(a)

回溯

Traceback (most recent call last):
  File "C:/Users/Ajay/PycharmProjects/SO/new.py", line 22, in <module>
    a = euclidean(critics['Lisa Rose'], critics['Mick LaSalle'])
  File "C:\Python33\lib\site-packages\scipy\spatial\distance.py", line 224, in euclidean
    dist = norm(u - v)
TypeError: unsupported operand type(s) for -: 'dict' and 'dict'

当提到euclidean实现时,似乎输入应该是tuples,但我无法理解如何处理它。

def euclidean(u, v):
    """
    Computes the Euclidean distance between two 1-D arrays.

    The Euclidean distance between 1-D arrays `u` and `v`, is defined as

    .. math::

       {||u-v||}_2

    Parameters
    ----------
    u : (N,) array_like
        Input array.
    v : (N,) array_like
        Input array.

    Returns
    -------
    euclidean : double
        The Euclidean distance between vectors `u` and `v`.

    """
    u = _validate_vector(u)
    v = _validate_vector(v)
    dist = norm(u - v)
    return dist

请赐教。

2 个答案:

答案 0 :(得分:1)

欧几里德距离定义为两个向量之间差异的L2范数,您可以在dist = norm(u - v)函数中看到euclidean。您的critics['Lisa Rose']critics['Mick LaSalle']是词典,并且没有为词典数据类型定义-(减法)操作。此外,norm是为类似数组的数据类型定义的。

因此,如果您确实需要使用scipy.spatial.distance.euclidean作为案例,则需要为critics创建一个类,并且在您的班级中,您需要通过定义{来重载-运算符{1}}方法,返回类似数组的数据类型。

答案 1 :(得分:0)

我用丑陋的代码制作了一个很好的API:

http://vectordict.readthedocs.org/en/latest/vector.html#metrics

重要提示:我建议不要使用此代码(我正确地重写它),只是看看它是如何工作的,并且可能尊重API但不使用代码。

通过覆盖+ / - / * / /“线性代数方式,您可能会对如何使用dict返回”数学“以及如何实现它以及它如何使生活更轻松感兴趣。

这是L2规范的实现:sqrt(self.dot(self))

https://github.com/jul/ADictAdd_iction/blob/master/vector_dict/VectorDict.py#L972

我提倡使用与对象上的线性代数定义一致的API,以便更容易理解您所阅读的内容。

from vector_dict.VectorDict import cos
from vector_dict.VectorDict import convert_tree, VectorDict
crit = {'Lisa Rose': {'Lady in the Water': 2.5, 'Snakes on a Plane': 3.5,
                         'Just My Luck': 3.0, 'Superman Returns': 3.5, 'You, Me and Dupree': 2.5,
                         'The Night Listener': 3.0},
           'Gene Seymour': {'Lady in the Water': 3.0, 'Snakes on a Plane': 3.5,
                            'Just My Luck': 1.5, 'Superman Returns': 5.0, 'The Night Listener': 3.0,
                            'You, Me and Dupree': 3.5},
           'Michael Phillips': {'Lady in the Water': 2.5, 'Snakes on a Plane': 3.0,
                                'Superman Returns': 3.5, 'The Night Listener': 4.0},
           'Claudia Puig': {'Snakes on a Plane': 3.5, 'Just My Luck': 3.0,
                            'The Night Listener': 4.5, 'Superman Returns': 4.0,
                            'You, Me and Dupree': 2.5},
           'Mick LaSalle': {'Lady in the Water': 3.0, 'Snakes on a Plane': 4.0,
                            'Just My Luck': 2.0, 'Superman Returns': 3.0, 'The Night Listener': 3.0,
                            'You, Me and Dupree': 2.0},
           'Jack Matthews': {'Lady in the Water': 3.0, 'Snakes on a Plane': 4.0,
                             'The Night Listener': 3.0, 'Superman Returns': 5.0, 'You, Me and Dupree': 3.5},
           'Toby': {'Snakes on a Plane': 4.5, 'You, Me and Dupree': 1.0, 'Superman Returns': 4.0}}
dd = convert_tree(crit)
print "cos"
print cos(dd['Gene Seymour'], dd['Toby'])
# 0.770024275094
print "L2 distance"
print dd['Gene Seymour'].norm()
# 8.35164654425    
print "jaccard similarities"
print dd['Gene Seymour'].jaccard( dd['Toby'])
# 0.579335793358
PS我猜想如果你正在制作规范演算,那就是进行比较,我跳到你想要进行相似性测量的结论。