上下文:
我有一个由3列组成的大型数据集:个人用户,电影评分和电影ID。
问题:
每行1个人用户给1部电影评分。
该个人用户可以在多行中为几部单独的电影评分。
我希望过滤并仅显示对MINIMUM个电影评分至少为3的单个用户(这意味着他们在“单个用户”列下显示3行或更多行),然后删除其他对2个电影评分为2的单个用户或1部电影。
我将在下面举例说明。
df.head()
userId movieId rating
0 1 307 3.5
1 1 481 3.5
2 1 1091 1.5
3 1 1257 4.5
4 1 1449 4.5
#So for example the above userID 1 is a user I would like to keep because
#he has rated more than 3 movies (5 in this case).
userId movieId rating
5 5 645 3.5
6 5 5678 3.5
7 6 5346 1.5
8 6 1434 4.5
9 7 7421 4.5
#in the above example user 5,6,7 are prime examples of users I would like
#to drop since they have not rated a minimum of 3 movies (2 and in this case)