sql select group by a count(1)>在蟒蛇熊猫1相当于?

时间:2014-12-31 08:57:48

标签: python sql pandas dataframe

我很难过滤掉pandas中的groupby项目。我想做

select email, count(1) as cnt 
from customers 
group by email 
having count(email) > 1 
order by cnt desc

我做了

customers.groupby('Email')['CustomerID'].size()

它正确地为我提供了电子邮件列表及其各自的计数,但我无法实现having count(email) > 1部分。

email_cnt[email_cnt.size > 1]

返回1

email_cnt = customers.groupby('Email')
email_dup = email_cnt.filter(lambda x:len(x) > 2)

给出了email > 1的客户的整个记录​​,但我想要聚合表。

2 个答案:

答案 0 :(得分:3)

不要写email_cnt[email_cnt.size > 1],而是写email_cnt[email_cnt > 1](不需要再次调用.size)。这使用布尔系列email_cnt > 1仅返回email_cnt的相关值。

例如:

>>> customers = pd.DataFrame({'Email':['foo','bar','foo','foo','baz','bar'],
                              'CustomerID':[1,2,1,2,1,1]})
>>> email_cnt = customers.groupby('Email')['CustomerID'].size()
>>> email_cnt[email_cnt > 1]
Email
bar      2
foo      3
dtype: int64

答案 1 :(得分:2)

另外两个解决方案(使用现代"方法链"方法):

使用selection by callable

customers.groupby('Email').size().loc[lambda x: x>1].sort_values()

使用query method

(customers.groupby('Email')['CustomerID'].
    agg([len]).query('len > 1').sort_values('len'))