Question

我有一个带有多级索引的数据框。我想根据特定的列对该数据帧进行排序，并为第一个索引的每个组提取前n行，但是每个组的n不同。
例如：

| Index1| Index2| Sort_In_descending_order |  How_manyRows_toChoose   |
-----------------------------------------------------------------------
|   1   |  20   |           3              |            2             |
|       |  40   |           2              |            2             |
|       |  10   |           1              |            2             |
|   2   |  20   |           2              |            1             |
|       |  50   |           1              |            1             |

结果应如下所示：

| Index1| Index2| Sort_In_descending_order |  How_manyRows_toChoose   |
-----------------------------------------------------------------------
|   1   |  20   |           3              |            2             |
|       |  40   |           2              |            2             |
|   2   |  20   |           2              |            1             |

我已经走了这么远：
df.groupby(level[0,1]).sum().sort_values(['Index1','Sort_In_descending_order'],ascending=False).groupby('Index1').head(2) 但是.head(2)会独立于“ How_manyRows_toChoose”列中的数字选择每个组的2个元素。

一些代码会很棒！
谢谢！

Answer 1

在GroupBy.apply中与head一起使用lambda函数，并添加参数group_keys=False以避免重复的索引值：

#original code
df = (df.groupby(level[0,1])
        .sum()
        .sort_values(['Index1','Sort_In_descending_order'],ascending=False))

df = (df.groupby('Index1', group_keys=False)
        .apply(lambda x: x.head(x['How_manyRows_toChoose'].iat[0])))
print (df)
               Sort_In_descending_order  How_manyRows_toChoose
Index1 Index2                                                 
1      20                             3                      2
       40                             2                      2
2      20                             2                      1

Python为多级索引的每组选择不同的行数

1 个答案: