我有来自SAP HANA SQL的代码,我需要将其转换为pandas。 大熊猫有可能吗?因为我没有找到这种情况的任何例子。 这是一个虚拟代码,所以请跳过缩进和命名约定部分。
select distinct
"A","B","C","D",
to_nvarchar(sum(to_decimal("Column2"))/TO_DECIMAL(max("Column3"))) as "Column2" ,to_nvarchar(min(to_date("Date",'YYYYMMDD')),'YYYYMMDD') as "Date",
from :Var1
group by
"A","B","C","D";
我试过了:
df4["Column2"]=df4.Column2.astype(int)
df4["Column2"]=df4["Column2"]/df4["Column3"].groupby(["A","B","C","D"]).agg({'
Column2': 'sum','Column3':'max'}).reset_index()
df5=df4[["A","B","C","D","Column3"]]
我收到KeyError" A"
INPUT TABLE:
A B C D Column2 Column3 date
BOE MT1 TYPE1 50000 45 5 20111231
BOE MT1 TYPE1 50000 35 1 20101201
BOE MT1 TYPE1 50001 85 5 20110721
BOE MT1 TYPE4 50000 25 5 20110718
BOE MT1 TYPE4 50001 90 5 20111212
A B C D Column2 date
BOE MT1 TYPE1 50000 16 <-(45+35)/5 20101201
BOE MT1 TYPE1 50001 17 <-85/5 20110721
BOE MT1 TYPE4 50000 5 <-25/5 20110718
BOE MT1 TYPE4 50001 18 <- 90/5 20111212
答案 0 :(得分:0)
IIUC,groupby
和apply
应该这样做
out = df1.groupby(['A', 'B', 'C', 'D'])\
.apply(lambda x: x.Column2.sum() / x.Column3.max())\
.reset_index()
print(out)
A B C D 0
0 BOE MT1 TYPE1 50000 16.0
1 BOE MT1 TYPE1 50001 17.0
2 BOE MT1 TYPE4 50000 5.0
3 BOE MT1 TYPE4 50001 18.0