Question

数据帧：

ProName ProCat  Price
EJBR45  EL  5432.00
XYCK23  MH  656.00
RMX57   EL  8787.00
FG567   CO  56548.00
GHK245  EC  56456.00
EJBR45  EL  6665.00
XYCK23  MH  6576.00
RMX57   EL  15465.00
FG567   CO  78887.00
GHK245  EC  54654.00
EJBR45  EL  43556.00
XYCK23  MH  98445.00
FG567   CO  65436.00
GHK245  EC  654365.00

在SQL中我使用以下查询：

select ProName, ProCat, max(Price) as Price 
from Dtatatatata
group by ProName,ProCat

结果：

ProName ProCat  Price
FG567   CO  78887.00
GHK245  EC  654365.00
EJBR45  EL  6665.00
RMX57   EL  8787.00
XYCK23  MH  98445.00

我们可以在python中执行此操作吗？

我试过Python : Getting the Row which has the max value in groups using groupby但不明白。请指导

应用于1.5万亿条记录数据帧，性能太慢：

distData = dataAll.set_index(['Donor', 'Recipient', 'Commodity Aggregation Type', 'Aid Category', 'Measure', 'Unit', 'Frequency', 'Date']).max(level=[0,1,2,3,4,5,6,7]).reset_index()

从其他列Python 3.6获取重复数据的最大值

0 个答案: