我想根据我制作的 df 手动计算相似度。
df_SimC = pd.DataFrame(df, columns = ['reviewerName','overall','Anger','Disgust','Fear','Joy','Sadness','Surprise'])
输出
overall Anger Disgust Fear Joy Sadness Surprise
1 0.229007 0.489583 0.190617 0.006204 0.075759 0.008829
4 0.001024 0.000020 0.052685 0.945093 0.000062 0.001116
等
现在我想创建一个循环遍历所有 df 行并计算余弦相似度的函数。所以我创建了这个函数,但由于某种原因,我遇到了一些错误,我认为问题出在函数中(我现在正在返回或函数无法从数据框中获取行的值)。
def SimC(nominator, denominator_Anger, denominator_Disgust, denominator_Fear, denominator_Joy,
denominator_Sadness, denominator_Surprise):
nominator, denominator_overall, denominator_Anger, denominator_Disgust, denominator_Fear,
denominator_Joy, denominator_Sadness, denominator_Surprise = 0, 0, 0, 0, 0, 0, 0, 0
for i in range(len(df_SimC)):
overall_sim = df_SimC.iloc[i]["overall"]
print(overall_sim)
Anger_sim = df_SimC.iloc[i]["Anger_sim"]
Disgust_sim = df_SimC.iloc[i]["Disgust_sim"]
Fear_sim = df_SimC.iloc[i]["Fear_sim"]
Joy_sim = df_SimC.iloc[i]["Joy_sim"]
Sadness_sim = df_SimC.iloc[i]["Sadness"]
Surprise_sim = df_SimC.iloc[i]["Surprise"]
denominator_overall += overall_sim * overall_sim
print(denominator_overall)
denominator_Anger += Anger_sim * Anger_sim
denominator_Disgust += Disgust_sim * Disgust_sim
denominator_Fear += Fear_sim * Fear_sim
denominator_Joy += Joy_sim * Joy_sim
denominator_Sadness += Sadness_sim * Sadness_sim
denominator_Surprise += Surprise_sim * Surprise_sim
nominator += denominator_overall * denominator_Anger * denominator_Disgust * denominator_Fear *
denominator_Joy * denominator_Sadness * denominator_Surprise
return (nominator / sqrt(denominator_overall * denominator_Anger * denominator_Disgust *
denominator_Fear * denominator_Joy * denominator_Sadness * denominator_Surprise))
我得到的错误
TypeError: ("SimC() missing 6 required positional arguments: 'denominator_Anger', 'denominator_Disgust', 'denominator_Fear', 'denominator_Joy', 'denominator_Sadness', and 'denominator_Surprise'", 'occurred at index overall')