用Python计算曼哈顿距离而没有结果

时间:2019-02-17 23:31:47

标签: python pandas dataframe

我在python中有这两个数据框,我想计算出曼哈顿距离,后来又计算出欧几里得距离,但是我陷入了曼哈顿距离,无法弄清楚出了什么问题。
到目前为止,这是我尝试过的:

ratings = pd.read_csv("toy_ratings.csv", ",")
person1 = ratings[ratings['Person'] == 1]['Rating']
person2 = ratings[ratings['Person'] == 2]['Rating']

ratings.head()
    Person Movie Rating
0   1      11   2.5
1   1      12   3.5
2   1      15   2.5
3   3      14   3.5
4   2      12   3.5

这是person1person2内的数据

print("*****person1*****")
print(person1)

*****person1*****
0     2.5
1     3.5
2     2.5
5     3.0
22    3.5
23    3.0
36    5.0

print("*****person2*****")
print(person2)

*****person2*****
4     3.5
6     3.0
8     1.5
9     5.0
11    3.0
24    3.5

这是我在没有运气的情况下尝试构建的功能:

def ManhattanDist(person1, person2):
    distance = 0
    for rating in person1:
        if rating in person2:
            distance += abs(person1[rating] - person2[rating])
            return distance

问题是该函数返回0,这是不正确的,当我调试时,我看到它永远不会进入第二个循环。如何执行检查以查看两行都有值并循环?

2 个答案:

答案 0 :(得分:1)

I think the function should give back (= return) the distance in any case: either the distance is zero as initiated, or it is is somethhing else. So the function should look like

def ManhattanDist(person1, person2):
    distance = 0
    for rating in person1:
        if rating in person2:
            distance += abs(person1[rating] - person2[rating])
    return distance

I think the distance should be built by two vectors of the same length (at least I cannot imagine any thing else). If this is the case you can do (without your function)

import numpy as np

p1 = np.array(person1)
p2 = np.array(person2)

#--- scalar product as similarity indicator
dist1 = np.dot(p1,p2)

#--- Euclidean distance
dist2 = np.linalg.norm(p1-p2)

#--- manhatten distance
dist3 = np.sum(np.abs(p1-p2))

答案 1 :(得分:0)

您的函数正在返回1个值...(我想)应该返回一个值列表。