分区聚合 - pandas Dataframe

时间:2016-03-10 00:22:40

标签: python pandas group-by dataframe partition

我正在寻找基于特定分区聚合值的最佳方法,相当于

SUM(TotalCost) OVER(PARTITION BY ShopName) Earnings  ( SQL server)

我可以通过Pandas中的以下步骤来做到这一点,但是寻找一种我确信应该存在的原生方法

TempDF= DF.groupby(by=['ShopName'])['TotalCost'].sum()

TempDF= TempDF.reset_index() 

NewDF=pd.merge(DF , TempDF, how='inner', on='ShopName')

非常感谢您阅读!

1 个答案:

答案 0 :(得分:16)

您可以在SQL聚合中使用pandas transform()方法,例如“OVER(partition by ...)”:

import pandas as pd
import numpy as np

#create dataframe with sample data
df = pd.DataFrame({'group':['A','A','A','B','B','B'],'value':[1,2,3,4,5,6]})

#calculate AVG(value) OVER (PARTITION BY group)
df['mean_value'] = df.groupby('group').value.transform(np.mean)

df:
group   value   mean_value
A       1       2
A       2       2
A       3       2
B       4       5
B       5       5
B       6       5