计算类别数,如果不满足条件,则逐列表删除

时间:2018-09-24 21:45:29

标签: python pandas dataframe

给出:

["a","b","c"]

需要:

import pandas as pd

lis1= ('apple','orange','strawberry','strawberry','strawberry','apple','orange','orange','orange','strawberry')
lis2= ("lorem ipsum review","lorem ipsum review","lorem ipsum review","lorem ipsum review","lorem ipsum review","lorem ipsum review","lorem ipsum review","lorem ipsum review","lorem ipsum review","lorem ipsum review")

pd.DataFrame({'category':lis1, 'review': lis2})

     category              review
0       apple  lorem ipsum review
1      orange  lorem ipsum review
2  strawberry  lorem ipsum review
3  strawberry  lorem ipsum review
4  strawberry  lorem ipsum review
5       apple  lorem ipsum review
6      orange  lorem ipsum review
7      orange  lorem ipsum review
8      orange  lorem ipsum review
9  strawberry  lorem ipsum review

我需要代码对唯一类别(nunique())进行计数,并删除仅出现少于3次的类别。该示例显示,由于apple是唯一出现两次的类别,因此应用了按列表删除。

1 个答案:

答案 0 :(得分:1)

您可以过滤groupbytransform的结果:

df[df.groupby('category')['category'].transform('count').gt(2)]

     category              review
1      orange  lorem ipsum review
2  strawberry  lorem ipsum review
3  strawberry  lorem ipsum review
4  strawberry  lorem ipsum review
6      orange  lorem ipsum review
7      orange  lorem ipsum review
8      orange  lorem ipsum review
9  strawberry  lorem ipsum review

另一种解决方案是value_counts + map

df[df.category.map(df['category'].value_counts()).gt(2)]

     category              review
1      orange  lorem ipsum review
2  strawberry  lorem ipsum review
3  strawberry  lorem ipsum review
4  strawberry  lorem ipsum review
6      orange  lorem ipsum review
7      orange  lorem ipsum review
8      orange  lorem ipsum review
9  strawberry  lorem ipsum review