我有一个包含以下列的事务表:customer_id,transaction_id,month
我想编写一个与SQL中的以下内容等效的查询:
SELECT min(month) as first_month, max(month) as last_month
FROM transactions
GROUP BY customer_id
在pandas中,我似乎只能汇总每一列,例如以下查询只会返回一个月的列:
transactions.groupby('customer_id').aggregate({ 'Month' : 'min', 'Month' : 'max'})
任何想法我怎样才能做到这一点?
答案 0 :(得分:1)
您可以使用:
transactions.groupby('customer_id').aggregate({ 'Month' : ['min', 'max']})
样品:
transactions = pd.DataFrame({'customer_id':[1,2,3,1,2,1],
'Month': [4,5,6,1,1,3]})
print (transactions)
Month customer_id
0 4 1
1 5 2
2 6 3
3 1 1
4 1 2
5 3 1
df = transactions.groupby('customer_id').aggregate({ 'Month' : ['min', 'max']})
print (df)
Month
min max
customer_id
1 1 4
2 1 5
3 6 6
更快的解决方案是:
g = transactions.groupby('customer_id')['Month']
print (pd.concat([g.min(), g.max()], axis=1, keys=['min','max']))