使用pandas中的groupby计算聚合数据

时间:2017-09-05 07:10:14

标签: python pandas dataframe group-by aggregate

我有来自SAP HANA SQL的代码,我需要将其转换为pandas。 大熊猫有可能吗?因为我没有找到这种情况的任何例子。 这是一个虚拟代码,所以请跳过缩进和命名约定部分。

   select distinct 
   "A","B","C","D",
   to_nvarchar(sum(to_decimal("Column2"))/TO_DECIMAL(max("Column3"))) as "Column2" ,to_nvarchar(min(to_date("Date",'YYYYMMDD')),'YYYYMMDD') as "Date",

   from :Var1 
   group by 
  "A","B","C","D";

我试过了:

df4["Column2"]=df4.Column2.astype(int)    
 df4["Column2"]=df4["Column2"]/df4["Column3"].groupby(["A","B","C","D"]).agg({'
 Colum‌​n2': 'sum','Colum‌​n3':'max'}).reset_index()
 df5=df4[["A","B","C","D","Colum‌​n3"]]

我收到KeyError" A"

 INPUT TABLE:
A   B     C     D     Column2   Column3  date
BOE MT1 TYPE1   50000      45   5       20111231
BOE MT1 TYPE1   50000      35   1       20101201
BOE MT1 TYPE1   50001      85   5       20110721
BOE MT1 TYPE4   50000      25   5       20110718
BOE MT1 TYPE4   50001      90   5       20111212





 A    B  C        D    Column2            date
BOE MT1 TYPE1   50000   16  <-(45+35)/5   20101201
BOE MT1 TYPE1   50001   17  <-85/5        20110721
BOE MT1 TYPE4   50000   5   <-25/5        20110718
BOE MT1 TYPE4   50001   18  <- 90/5       20111212

1 个答案:

答案 0 :(得分:0)

IIUC,groupbyapply应该这样做

out = df1.groupby(['A', 'B', 'C', 'D'])\
           .apply(lambda x: x.Column2.sum() / x.Column3.max())\
           .reset_index()
print(out)

     A    B      C      D     0
0  BOE  MT1  TYPE1  50000  16.0
1  BOE  MT1  TYPE1  50001  17.0
2  BOE  MT1  TYPE4  50000   5.0
3  BOE  MT1  TYPE4  50001  18.0