Question

我有一个熊猫数据框df，其中包含来源，目的地和从来源到目的地的费用。

SRCLAT SRCLONG DESTLAT DESTLONG PRICE
43.5   47.5    103.5   104      50                
43.5   47.5    103.5   104      100                  
43.5   47.5    103.5   104      100               
43.5   30      90      80       300                 
43.5   30      90      80       400
               90      80

我正在尝试对价格与来源坐标到目标坐标相同的行进行百分比排名，其中最高的百分比是最低的价格，而忽略nans

我想要的输出：

SRCLAT SRCLONG DESTLAT DESTLONG PRICE  PERCENTILE
43.5   47.5    103.5   104      50       100% (best price out of 3)         
43.5   47.5    103.5   104      100      67% (tied for 2nd out of 3)            
43.5   47.5    103.5   104      100      67% (tied for 2nd out of 3)        
43.5   30      90      80       300      100% (best out of 2)          
43.5   30      90      80       400      50% (worst out of 2)
               90      80

我该怎么做？

我尝试用4个列进行分组

df.groupby([SRCLAT, SRCLONG, DESTLAT, DESTLONG)].size()

要获取每个唯一组的大小，但我对从何处去感到困惑

Answer 1

将rank与method='max'一起使用

c = ['SRCLAT', 'SRCLONG', 'DESTLAT', 'DESTLONG']
d = {'pct': True, 'ascending': False, 'method': 'max'}

df.assign(PERCENTILE=df.groupby(c)['PRICE'].rank(**d))

   SRCLAT  SRCLONG  DESTLAT  DESTLONG  PRICE  PERCENTILE
0    43.5     47.5    103.5       104     50    1.000000
1    43.5     47.5    103.5       104    100    0.666667
2    43.5     47.5    103.5       104    100    0.666667
3    43.5     30.0     90.0        80    300    1.000000
4    43.5     30.0     90.0        80    400    0.500000

如何对4列进行分组并根据另一列进行排名？

1 个答案: