Question

我有一个包含许多重复项的DataFrame（我需要Type / StrikePrice对是唯一的），如下所示：

                   Pos  AskPrice
Type  StrikePrice
C     1500.0       10    281.6
C     1500.0       11    281.9
C     1500.0       12    281.7     <- I need this one
P     1400.0       30    1200.5
P     1400.0       31    1250.2    <- I need this one

如何按Type + StrikePrice进行分组并应用一些逻辑（我自己的函数）来决定选择哪一行（让我们用最大的Pos来说）

预期结果是

                   Pos  AskPrice
Type  StrikePrice
C     1500.0       12    281.7
P     1400.0       31    1250.2

非常感谢！

Answer 1

首先reset_index表示唯一索引，然后是groupby idxmax表示每个组的最大值索引，并按loc选择行，set_index表示{ {1}}：

MultiIndex

或者sort_values使用drop_duplicates：

df = df.reset_index()
df = df.loc[df.groupby(['Type','StrikePrice'])['Pos'].idxmax()]
       .set_index(['Type','StrikePrice'])

但是如果需要使用自定义函数GroupBy.apply：

df = (df.reset_index()
       .sort_values(['Type','StrikePrice', 'Pos'])
       .drop_duplicates(['Type','StrikePrice'], keep='last')
       .set_index(['Type','StrikePrice']))
print (df)

                  Pos  AskPrice
Type StrikePrice               
C    1500.0        12     281.7
P    1400.0        31    1250.2

pandas DataFrame.groupby并应用自定义函数

1 个答案: