我有下一个pandas DataFrame df:
user item rating
1 1 1
1 2 1
1 3 3
2 1 2
2 2 2
2 3 1
...
我希望规范化评级,定义0到1之间的所有评级值。方法非常简单,只需将某个用户的每个值除以该用户的最大值。
我创建了下一个代码:
ratingNormalised = []
for user in df['user'].unique:
dfUser = df[df['user'] == user]
userNormalised = (dfUser['rating']/max(dfUser['rating'])).tolist()
ratingNormalised.extend(userNormalised)
df['ratingNorm'] = Series(ratingNormalised, index=df.index)
是否有可能找到更好的解决方案,这可能更加pythonic?
答案 0 :(得分:1)
用户分组并应用lambda:
In [73]:
df['norm rating'] = df.groupby('user')['rating'].apply(lambda x: x/x.max())
df
Out[73]:
user item rating norm rating
0 1 1 1 0.333333
1 1 2 1 0.333333
2 1 3 3 1.000000
3 2 1 2 1.000000
4 2 2 2 1.000000
5 2 3 1 0.500000