Question

我正在尝试使用euclidean distance来实现scipy.spatial.distance，我之前通常会这样写。{/ p>

from math import sqrt

critics = {'Lisa Rose': {'Lady in the Water': 2.5, 'Snakes on a Plane': 3.5,
                         'Just My Luck': 3.0, 'Superman Returns': 3.5, 'You, Me and Dupree': 2.5,
                         'The Night Listener': 3.0},
           'Gene Seymour': {'Lady in the Water': 3.0, 'Snakes on a Plane': 3.5,
                            'Just My Luck': 1.5, 'Superman Returns': 5.0, 'The Night Listener': 3.0,
                            'You, Me and Dupree': 3.5},
           'Michael Phillips': {'Lady in the Water': 2.5, 'Snakes on a Plane': 3.0,
                                'Superman Returns': 3.5, 'The Night Listener': 4.0},
           'Claudia Puig': {'Snakes on a Plane': 3.5, 'Just My Luck': 3.0,
                            'The Night Listener': 4.5, 'Superman Returns': 4.0,
                            'You, Me and Dupree': 2.5},
           'Mick LaSalle': {'Lady in the Water': 3.0, 'Snakes on a Plane': 4.0,
                            'Just My Luck': 2.0, 'Superman Returns': 3.0, 'The Night Listener': 3.0,
                            'You, Me and Dupree': 2.0},
           'Jack Matthews': {'Lady in the Water': 3.0, 'Snakes on a Plane': 4.0,
                             'The Night Listener': 3.0, 'Superman Returns': 5.0, 'You, Me and Dupree': 3.5},
           'Toby': {'Snakes on a Plane': 4.5, 'You, Me and Dupree': 1.0, 'Superman Returns': 4.0}}


def sim_distance(preference, person1, person2):
    si = {}
    for item in preference[person1]:
        if item in preference[person2]:
            si[item] = 1

    if len(si) == 0: return 0

    sum_of_scores = sum([pow(preference[person1][item] - preference[person2][item], 2)
                         for item in preference[person1] if item in preference[person2]])

    return 1 / (1 + sum_of_scores)


a = sim_distance(critics, 'Lisa Rose','Mick LaSalle')
print(a) #0.333

当前的实现工作正常，但是当尝试使用scipy模块时，我无法理解应该为其提供什么类型的输入。这就是我的尝试。

from scipy.spatial.distance import euclidean


a = euclidean(critics['Lisa Rose'], critics['Mick LaSalle'])
print(a)

回溯

Traceback (most recent call last):
  File "C:/Users/Ajay/PycharmProjects/SO/new.py", line 22, in <module>
    a = euclidean(critics['Lisa Rose'], critics['Mick LaSalle'])
  File "C:\Python33\lib\site-packages\scipy\spatial\distance.py", line 224, in euclidean
    dist = norm(u - v)
TypeError: unsupported operand type(s) for -: 'dict' and 'dict'

当提到euclidean实现时，似乎输入应该是tuples，但我无法理解如何处理它。

def euclidean(u, v):
    """
    Computes the Euclidean distance between two 1-D arrays.

    The Euclidean distance between 1-D arrays `u` and `v`, is defined as

    .. math::

       {||u-v||}_2

    Parameters
    ----------
    u : (N,) array_like
        Input array.
    v : (N,) array_like
        Input array.

    Returns
    -------
    euclidean : double
        The Euclidean distance between vectors `u` and `v`.

    """
    u = _validate_vector(u)
    v = _validate_vector(v)
    dist = norm(u - v)
    return dist

请赐教。

Answer 1

欧几里德距离定义为两个向量之间差异的L2范数，您可以在dist = norm(u - v)函数中看到euclidean。您的critics['Lisa Rose']和critics['Mick LaSalle']是词典，并且没有为词典数据类型定义-（减法）操作。此外，norm是为类似数组的数据类型定义的。

因此，如果您确实需要使用scipy.spatial.distance.euclidean作为案例，则需要为critics创建一个类，并且在您的班级中，您需要通过定义{来重载-运算符{1}}方法，返回类似数组的数据类型。

Answer 2

我用丑陋的代码制作了一个很好的API：

http://vectordict.readthedocs.org/en/latest/vector.html#metrics

重要提示：我建议不要使用此代码（我正确地重写它），只是看看它是如何工作的，并且可能尊重API但不使用代码。

通过覆盖+ / - / * / /“线性代数方式，您可能会对如何使用dict返回”数学“以及如何实现它以及它如何使生活更轻松感兴趣。

这是L2规范的实现：sqrt（self.dot（self））

https://github.com/jul/ADictAdd_iction/blob/master/vector_dict/VectorDict.py#L972

我提倡使用与对象上的线性代数定义一致的API，以便更容易理解您所阅读的内容。

from vector_dict.VectorDict import cos
from vector_dict.VectorDict import convert_tree, VectorDict
crit = {'Lisa Rose': {'Lady in the Water': 2.5, 'Snakes on a Plane': 3.5,
                         'Just My Luck': 3.0, 'Superman Returns': 3.5, 'You, Me and Dupree': 2.5,
                         'The Night Listener': 3.0},
           'Gene Seymour': {'Lady in the Water': 3.0, 'Snakes on a Plane': 3.5,
                            'Just My Luck': 1.5, 'Superman Returns': 5.0, 'The Night Listener': 3.0,
                            'You, Me and Dupree': 3.5},
           'Michael Phillips': {'Lady in the Water': 2.5, 'Snakes on a Plane': 3.0,
                                'Superman Returns': 3.5, 'The Night Listener': 4.0},
           'Claudia Puig': {'Snakes on a Plane': 3.5, 'Just My Luck': 3.0,
                            'The Night Listener': 4.5, 'Superman Returns': 4.0,
                            'You, Me and Dupree': 2.5},
           'Mick LaSalle': {'Lady in the Water': 3.0, 'Snakes on a Plane': 4.0,
                            'Just My Luck': 2.0, 'Superman Returns': 3.0, 'The Night Listener': 3.0,
                            'You, Me and Dupree': 2.0},
           'Jack Matthews': {'Lady in the Water': 3.0, 'Snakes on a Plane': 4.0,
                             'The Night Listener': 3.0, 'Superman Returns': 5.0, 'You, Me and Dupree': 3.5},
           'Toby': {'Snakes on a Plane': 4.5, 'You, Me and Dupree': 1.0, 'Superman Returns': 4.0}}
dd = convert_tree(crit)
print "cos"
print cos(dd['Gene Seymour'], dd['Toby'])
# 0.770024275094
print "L2 distance"
print dd['Gene Seymour'].norm()
# 8.35164654425    
print "jaccard similarities"
print dd['Gene Seymour'].jaccard( dd['Toby'])
# 0.579335793358

PS我猜想如果你正在制作规范演算，那就是进行比较，我跳到你想要进行相似性测量的结论。

欧氏距离误差：不支持的操作数类型

2 个答案: