数据帧:
ProName ProCat Price
EJBR45 EL 5432.00
XYCK23 MH 656.00
RMX57 EL 8787.00
FG567 CO 56548.00
GHK245 EC 56456.00
EJBR45 EL 6665.00
XYCK23 MH 6576.00
RMX57 EL 15465.00
FG567 CO 78887.00
GHK245 EC 54654.00
EJBR45 EL 43556.00
XYCK23 MH 98445.00
FG567 CO 65436.00
GHK245 EC 654365.00
在SQL中我使用以下查询:
select ProName, ProCat, max(Price) as Price
from Dtatatatata
group by ProName,ProCat
结果:
ProName ProCat Price
FG567 CO 78887.00
GHK245 EC 654365.00
EJBR45 EL 6665.00
RMX57 EL 8787.00
XYCK23 MH 98445.00
我们可以在python中执行此操作吗?
我试过Python : Getting the Row which has the max value in groups using groupby但不明白。请指导
应用于1.5万亿条记录数据帧,性能太慢:
distData = dataAll.set_index(['Donor', 'Recipient', 'Commodity Aggregation Type', 'Aid Category', 'Measure', 'Unit', 'Frequency', 'Date']).max(level=[0,1,2,3,4,5,6,7]).reset_index()