评级正常化

时间:2014-10-28 08:12:27

标签: python for-loop pandas

我有下一个pandas DataFrame df:

user item rating
1    1    1
1    2    1
1    3    3
2    1    2
2    2    2
2    3    1
...

我希望规范化评级,定义0到1之间的所有评级值。方法非常简单,只需将某个用户的每个值除以该用户的最大值。

我创建了下一个代码:

ratingNormalised = []

for user in df['user'].unique:
    dfUser = df[df['user'] == user]
    userNormalised = (dfUser['rating']/max(dfUser['rating'])).tolist()
    ratingNormalised.extend(userNormalised)

df['ratingNorm'] = Series(ratingNormalised, index=df.index)

是否有可能找到更好的解决方案,这可能更加pythonic?

1 个答案:

答案 0 :(得分:1)

用户分组并应用lambda:

In [73]:

df['norm rating'] = df.groupby('user')['rating'].apply(lambda x: x/x.max())
df

Out[73]:
   user  item  rating  norm rating
0     1     1       1     0.333333
1     1     2       1     0.333333
2     1     3       3     1.000000
3     2     1       2     1.000000
4     2     2       2     1.000000
5     2     3       1     0.500000