Python - scipy pdist在Dictionaries的DataFrame中的列之间

时间:2015-03-18 03:00:42

标签: python dictionary pandas scipy dataframe

我正在制作计算电影评论之间欧几里德距离的计划。我希望在给定的评论者和另一个给定的评论者,以及给定的评论者和所有其他人之间进行计算。我有这样的字典DataFrame中的数据:

{
    'Nancy Pollock': {
        'Lawrence of Arabia': 2.5,
        'Gravity': 3.5,
        'The Godfather': 3.0,
        'Prometheus': 3.5,
        'For a Few Dollars More': 2.5,
        'The Guns of Navarone': 3.0
    },
    'Jack Holmes': {
        'Lawrence of Arabia': 3.0,
        'Gravity': 3.5,
        'The Godfather': 1.5,
        'Prometheus': 5.0,
        'The Guns of Navarone': 3.0,
        'For a Few Dollars More': 3.5
    },
    'Mary Doyle': {
        'Lawrence of Arabia': 2.5,
        'Gravity': 3.0,
        'Prometheus': 3.5,
        'The Guns of Navarone': 4.0
    },
    'Doug Redpath': {
        'Gravity': 3.5,
        'The Godfather': 3.0,
        'The Guns of Navarone': 4.5,
        'Prometheus': 4.0,
        'For a Few Dollars More': 2.5
    },
    'Jill Brown': {
        'Lawrence of Arabia': 3.0,
        'Gravity': 4.0,
        'The Godfather': 2.0,
        'Prometheus': 3.0,
        'The Guns of Navarone': 3.0,
        'For a Few Dollars More': 2.0
    },
    'Trevor Chappell': {
        'Lawrence of Arabia': 3.0,
        'Gravity': 4.0,
        'The Guns of Navarone': 3.0,
        'Prometheus': 5.0,
        'For a Few Dollars More': 3.5
    },
    'Peter': {
        'Gravity': 4.5,
        'For a Few Dollars More': 1.0,
        'Prometheus': 4.0
    }
}

我在这里相当迷失,但我想知道的是如何创建一个函数将每个字典放入pdist可以使用的格式。然后我可以研究如何迭代它。我到目前为止的代码如下:

import pandas as pd
from scipy.spatial.distance import pdist, squareform
f= open("reviews.txt")
d= eval(f.read())
#print(d)
df = pd.DataFrame(d)
print(df)
def getSimilarity():
    EcDist = pd.DataFrame(index=df.index) #container for results
    movieArray = df.values
    #some way of turning it into a format pdist can use
    EcDist = pdist#etc
    return EcDist

def getSimilarities():
    EcDist2 = pd.DataFrame(index=df.index)
    movieArrays = df.values
    #some way of turning it into a format pdist can use
    EcDist2 = pdist#etc
    return EcDist2

0 个答案:

没有答案