我正在制作计算电影评论之间欧几里德距离的计划。我希望在给定的评论者和另一个给定的评论者,以及给定的评论者和所有其他人之间进行计算。我有这样的字典DataFrame中的数据:
{
'Nancy Pollock': {
'Lawrence of Arabia': 2.5,
'Gravity': 3.5,
'The Godfather': 3.0,
'Prometheus': 3.5,
'For a Few Dollars More': 2.5,
'The Guns of Navarone': 3.0
},
'Jack Holmes': {
'Lawrence of Arabia': 3.0,
'Gravity': 3.5,
'The Godfather': 1.5,
'Prometheus': 5.0,
'The Guns of Navarone': 3.0,
'For a Few Dollars More': 3.5
},
'Mary Doyle': {
'Lawrence of Arabia': 2.5,
'Gravity': 3.0,
'Prometheus': 3.5,
'The Guns of Navarone': 4.0
},
'Doug Redpath': {
'Gravity': 3.5,
'The Godfather': 3.0,
'The Guns of Navarone': 4.5,
'Prometheus': 4.0,
'For a Few Dollars More': 2.5
},
'Jill Brown': {
'Lawrence of Arabia': 3.0,
'Gravity': 4.0,
'The Godfather': 2.0,
'Prometheus': 3.0,
'The Guns of Navarone': 3.0,
'For a Few Dollars More': 2.0
},
'Trevor Chappell': {
'Lawrence of Arabia': 3.0,
'Gravity': 4.0,
'The Guns of Navarone': 3.0,
'Prometheus': 5.0,
'For a Few Dollars More': 3.5
},
'Peter': {
'Gravity': 4.5,
'For a Few Dollars More': 1.0,
'Prometheus': 4.0
}
}
我在这里相当迷失,但我想知道的是如何创建一个函数将每个字典放入pdist可以使用的格式。然后我可以研究如何迭代它。我到目前为止的代码如下:
import pandas as pd
from scipy.spatial.distance import pdist, squareform
f= open("reviews.txt")
d= eval(f.read())
#print(d)
df = pd.DataFrame(d)
print(df)
def getSimilarity():
EcDist = pd.DataFrame(index=df.index) #container for results
movieArray = df.values
#some way of turning it into a format pdist can use
EcDist = pdist#etc
return EcDist
def getSimilarities():
EcDist2 = pd.DataFrame(index=df.index)
movieArrays = df.values
#some way of turning it into a format pdist can use
EcDist2 = pdist#etc
return EcDist2