Question

我有一个由3列和n行组成的数据框。

我的数据框在分组之前看起来像

Index    Max_Mass (kg/m)    Max_Diameter (m)
1             10                   1
2             20                   2
3             30                   3

200           5                    4
201           60                   3
202           20                   2

300           90                   1
301           3                    1
302           10                   1

400           100                  1
401           10                   1
402           10                   1

我将数据帧每100行切割一次，从而对数据帧进行分组，这样我可以使用以下方法每100行找到特定列的最大值：

groups = output_df.groupby(pd.cut(output_df.index, range(0,len(output_df), 100)))

我正在使用以下内容查找“最大质量（kg / m）”列的最大值：

groups.max()['Max Mass (kg/m)']

我现在要制作另一个df，其中将包含找到的最大值和该值的索引。如何检索索引？我尝试使用以下内容，但据我了解，它仅适用于单个值，而上面的行向我返回一列所有最大值。

(groups.max()['Max Mass (kg/m)']).getidx()

我的预期输出（对于上面的DataFrame）将是

我要创建的新数据框应如下所示；

Index    Max_Mass (kg/m)    Max_Diameter (m)
3             30                   3
201           60                   3
300           90                   1
400           100                  1

Answer 1

内联评论。

['07:00 AM', '12:00 PM', '04:00 PM', '06:00 PM']

# Initialise the grouper.
grouper = df.Index // 100
# Get list of indices corresponding to the max using `apply`.
idx = df.groupby(grouper).apply(
          lambda x: x.set_index('Index')['Max_Mass (kg/m)'].idxmax())
# Compute the max and update the other columns based on `idx` computed previously.
v = df.groupby(grouper, as_index=False)['Max_Mass (kg/m)'].max()
v['Index'] = idx.values
v['Max_Diameter (m)'] = df.loc[df.Index.isin(v.Index), 'Max_Diameter (m)'].values

Answer 2

可以使用groups.idxmax()来代替使用groups.max（）。然后使用索引获取最大值。现在您拥有了所需的一切。

从每个组的另一列中获取与idxmax对应的列值

2 个答案: