select df.id, count(distinct airports) as num
from df
group by df.id
having count(distinct airports) > 3
我正在尝试在Python pandas中执行上述操作。我尝试了filter
,nunique
,agg
的不同组合,但没有任何效果。有什么建议?
例如: DF
df
id airport
1 lax
1 ohare
2 phl
3 lax
2 mdw
2 lax
2 sfw
2 tpe
所以我希望结果是:
id num
2 5
答案 0 :(得分:2)
您可以将SeriesGroupBy.nunique
与boolean indexing
或query
:
s = df.groupby('id')['airport'].nunique()
print (s)
id
1 2
2 5
3 1
Name: airport, dtype: int64
df1 = s[s > 3].reset_index()
print (df1)
id airport
0 2 5
或者:
df1 = df.groupby('id')['airport'].nunique().reset_index().query('airport > 3')
print (df1)
id airport
1 2 5
答案 1 :(得分:0)
使用groupby和count:
df_new = df.groupby('id').count()
过滤器:
df_new = df_new[(df_new['airport'] > 3)]