给出:
["a","b","c"]
需要:
import pandas as pd
lis1= ('apple','orange','strawberry','strawberry','strawberry','apple','orange','orange','orange','strawberry')
lis2= ("lorem ipsum review","lorem ipsum review","lorem ipsum review","lorem ipsum review","lorem ipsum review","lorem ipsum review","lorem ipsum review","lorem ipsum review","lorem ipsum review","lorem ipsum review")
pd.DataFrame({'category':lis1, 'review': lis2})
category review
0 apple lorem ipsum review
1 orange lorem ipsum review
2 strawberry lorem ipsum review
3 strawberry lorem ipsum review
4 strawberry lorem ipsum review
5 apple lorem ipsum review
6 orange lorem ipsum review
7 orange lorem ipsum review
8 orange lorem ipsum review
9 strawberry lorem ipsum review
我需要代码对唯一类别(nunique())进行计数,并删除仅出现少于3次的类别。该示例显示,由于apple是唯一出现两次的类别,因此应用了按列表删除。
答案 0 :(得分:1)
您可以过滤groupby
和transform
的结果:
df[df.groupby('category')['category'].transform('count').gt(2)]
category review
1 orange lorem ipsum review
2 strawberry lorem ipsum review
3 strawberry lorem ipsum review
4 strawberry lorem ipsum review
6 orange lorem ipsum review
7 orange lorem ipsum review
8 orange lorem ipsum review
9 strawberry lorem ipsum review
另一种解决方案是value_counts
+ map
:
df[df.category.map(df['category'].value_counts()).gt(2)]
category review
1 orange lorem ipsum review
2 strawberry lorem ipsum review
3 strawberry lorem ipsum review
4 strawberry lorem ipsum review
6 orange lorem ipsum review
7 orange lorem ipsum review
8 orange lorem ipsum review
9 strawberry lorem ipsum review