Question

我有一个包含三列的pandas数据框，我想删除所有唯一组合的行 df ['person']，df ['id']和df ['day']只出现两次或更少。在熊猫中有一种简单的方法吗？

[In]:
person  id  day
1       2   1
1       2   1
1       2   1
1       2   1
1       1   1
1       1   1
1       1   1
1       0   1
1       2   2
2       2   2
2       2   2
2       2   2
1       3   1
1       3   1
1       3   1
1       0   1
2       2   2

[Out]:
person  id  day
1       2   1
1       2   1
1       2   1
1       1   1
1       1   1
1       1   1
2       2   2
2       2   2
2       2   2
1       3   1
1       3   1
1       3   1
2       2   2

Answer 1

df.groupby(['person','id','day']).filter(lambda x:x.shape[0]>2)

https://pandas.pydata.org/pandas-docs/stable/generated/pandas.core.groupby.DataFrameGroupBy.filter.html

Answer 2

我们可以使用transform构建新的参数信息

df['Info']=df.groupby(list(df)).id.transform('count')
df
Out[444]: 
    person  id  day  Info
0        1   2    1     4
1        1   2    1     4
2        1   2    1     4
3        1   2    1     4
4        1   1    1     3
5        1   1    1     3
6        1   1    1     3
7        1   0    1     2
8        1   2    2     1
9        2   2    2     4
10       2   2    2     4
11       2   2    2     4
12       1   3    1     3
13       1   3    1     3
14       1   3    1     3
15       1   0    1     2
16       2   2    2     4

然后你可以做

df[df.Info>2].drop('Info',1)
Out[447]: 
    person  id  day
0        1   2    1
1        1   2    1
2        1   2    1
3        1   2    1
4        1   1    1
5        1   1    1
6        1   1    1
9        2   2    2
10       2   2    2
11       2   2    2
12       1   3    1
13       1   3    1
14       1   3    1
16       2   2    2

根据三列删除大于一定数量的pandas中的行

2 个答案: