我在python中有这两个数据框,我想计算出曼哈顿距离,后来又计算出欧几里得距离,但是我陷入了曼哈顿距离,无法弄清楚出了什么问题。
到目前为止,这是我尝试过的:
ratings = pd.read_csv("toy_ratings.csv", ",")
person1 = ratings[ratings['Person'] == 1]['Rating']
person2 = ratings[ratings['Person'] == 2]['Rating']
ratings.head()
Person Movie Rating
0 1 11 2.5
1 1 12 3.5
2 1 15 2.5
3 3 14 3.5
4 2 12 3.5
这是person1
和person2
内的数据
print("*****person1*****")
print(person1)
*****person1*****
0 2.5
1 3.5
2 2.5
5 3.0
22 3.5
23 3.0
36 5.0
print("*****person2*****")
print(person2)
*****person2*****
4 3.5
6 3.0
8 1.5
9 5.0
11 3.0
24 3.5
这是我在没有运气的情况下尝试构建的功能:
def ManhattanDist(person1, person2):
distance = 0
for rating in person1:
if rating in person2:
distance += abs(person1[rating] - person2[rating])
return distance
问题是该函数返回0,这是不正确的,当我调试时,我看到它永远不会进入第二个循环。如何执行检查以查看两行都有值并循环?
答案 0 :(得分:1)
I think the function should give back (= return) the distance in any case: either the distance is zero as initiated, or it is is somethhing else. So the function should look like
def ManhattanDist(person1, person2):
distance = 0
for rating in person1:
if rating in person2:
distance += abs(person1[rating] - person2[rating])
return distance
I think the distance should be built by two vectors of the same length (at least I cannot imagine any thing else). If this is the case you can do (without your function)
import numpy as np
p1 = np.array(person1)
p2 = np.array(person2)
#--- scalar product as similarity indicator
dist1 = np.dot(p1,p2)
#--- Euclidean distance
dist2 = np.linalg.norm(p1-p2)
#--- manhatten distance
dist3 = np.sum(np.abs(p1-p2))
答案 1 :(得分:0)
您的函数正在返回1个值...(我想)应该返回一个值列表。