来自熊猫df的余弦相似度

时间:2021-04-18 10:23:35

标签: python pandas cosine-similarity

我想根据我制作的 df 手动计算相似度。

df_SimC = pd.DataFrame(df, columns = ['reviewerName','overall','Anger','Disgust','Fear','Joy','Sadness','Surprise'])

输出

overall  Anger     Disgust   Fear      Joy       Sadness   Surprise
1        0.229007  0.489583  0.190617  0.006204  0.075759  0.008829
4        0.001024  0.000020  0.052685  0.945093  0.000062  0.001116

现在我想创建一个循环遍历所有 df 行并计算余弦相似度的函数。所以我创建了这个函数,但由于某种原因,我遇到了一些错误,我认为问题出在函数中(我现在正在返回或函数无法从数据框中获取行的值)。

def SimC(nominator, denominator_Anger, denominator_Disgust, denominator_Fear, denominator_Joy, 
    denominator_Sadness, denominator_Surprise):

    nominator, denominator_overall, denominator_Anger, denominator_Disgust, denominator_Fear, 
    denominator_Joy, denominator_Sadness, denominator_Surprise = 0, 0, 0, 0, 0, 0, 0, 0

    for i in range(len(df_SimC)):
        overall_sim = df_SimC.iloc[i]["overall"]
        print(overall_sim)
        Anger_sim = df_SimC.iloc[i]["Anger_sim"]
        Disgust_sim = df_SimC.iloc[i]["Disgust_sim"]
        Fear_sim = df_SimC.iloc[i]["Fear_sim"]
        Joy_sim = df_SimC.iloc[i]["Joy_sim"]
        Sadness_sim = df_SimC.iloc[i]["Sadness"]
        Surprise_sim = df_SimC.iloc[i]["Surprise"]

        denominator_overall += overall_sim * overall_sim
        print(denominator_overall)
        denominator_Anger += Anger_sim * Anger_sim
        denominator_Disgust += Disgust_sim * Disgust_sim
        denominator_Fear += Fear_sim * Fear_sim
        denominator_Joy += Joy_sim * Joy_sim
        denominator_Sadness += Sadness_sim * Sadness_sim
        denominator_Surprise += Surprise_sim * Surprise_sim

        nominator += denominator_overall * denominator_Anger * denominator_Disgust * denominator_Fear * 
        denominator_Joy * denominator_Sadness * denominator_Surprise

        return (nominator / sqrt(denominator_overall * denominator_Anger * denominator_Disgust * 
        denominator_Fear * denominator_Joy * denominator_Sadness * denominator_Surprise))

我得到的错误

TypeError: ("SimC() missing 6 required positional arguments: 'denominator_Anger', 'denominator_Disgust', 'denominator_Fear', 'denominator_Joy', 'denominator_Sadness', and 'denominator_Surprise'", 'occurred at index overall')

0 个答案:

没有答案