Question

我正在使用以下形式的图书评分数据集

ArrayList

现在，我需要添加第四列，其中包含每个用户在整个数据集中拥有的评分数量：

userID | ISBN | Rating
23413    1232     2.5
12321    2311     3.2
23413    2532     1.7
23413    7853     3.8

我尝试过：

userID | ISBN | Rating | Ratings_per_user
23413    1232     2.5         3
12321    2311     3.2         1
23413    2532     1.7         3 
23413    7853     3.8         3

但出现错误：

df_new['Ratings_per_user'] = df_new['userID'].value_counts()

，整个新列将填充A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead。

Answer 1

将value_counts的结果转换为dict，然后使用replace创建具有用户评分的新列

x = df['userID'].value_counts().to_dict()

df['rating_per_user'] = df['userID'].replace(x)
print(df)

输出：

  userID  ISBN  rating  rating_per_user                                                                                              
0   23413  1232     2.5                3                                                                                              
1   12321  2311     3.2                1                                                                                              
2   23413  2532     1.7                3                                                                                              
3   23413  7853     3.8                3

Answer 2

使用：

df_new['Ratings_per_user']=df_new.groupby('userID')['userID'].transform('count')

   userID  ISBN  rating  Ratings_per_user
0   23413  1232     2.5                 3
1   12321  2311     3.2                 1
2   23413  2532     1.7                 3
3   23413  7853     3.8                 3

Answer 3

您可以使用map：

df['Rating per user'] = df['userID'].map(df.groupby('userID')['Rating'].count())
print(df)

   userID  ISBN  Rating  Rating per user
0   23413  1232     2.5                3
1   12321  2311     3.2                1
2   23413  2532     1.7                3
3   23413  7853     3.8                3

添加带有每位用户（熊猫）的评分数的列

3 个答案: