熊猫数据框重置索引的多索引计数

时间:2020-09-08 16:04:04

标签: python pandas dataframe multi-index

通过将几个数据帧与键[a,b,c]连为索引来构造一个数据帧

+-------+----------+----------+
| Index | IndexPos | SomeData |
+-------+----------+----------+
| a     |        1 | some1     |
|       |        2 | some2     |
|       |        3 | some3     |
| b     |        1 | some1     |
|       |        2 | some2     |
|       |        3 | some3     |
| c     |        1 | some1     |
|       |        2 | some2     |
|       |        3 | some3     |
+-------+----------+----------+

现在将其切成最后两个元素,例如:

df.groupby(df.index.levels[0].name).tail(2)

此后,我想重新计算剩余元素IndexPos的大小:

+-------+----------+----------+
| Index | IndexPos | SomeData |
+-------+----------+----------+
| a     |        1 | some2     |
|       |        2 | some3     |
| b     |        1 | some2     |
|       |        2 | some3     |
| c     |        1 | some2     |
|       |        2 | some3     |
+-------+----------+----------+

有没有办法做到这一点,还是必须在连接之前将其切片?

1 个答案:

答案 0 :(得分:2)

首先在groupby上使用level=0,然后使用tail从每个组中获取最后两行,然后在切片的数据帧上使用groupby + cumcount创建一个顺序每个组的计数器,并将其设置为level=1的新索引:

d = df.groupby(level=0).tail(2)
d = d.droplevel(1).set_index(d.groupby(level=0).cumcount().add(1), append=True)

或者在受@anky解决方案启发的情况下,使用factorize代替groupby + cumcount

d = df.groupby(level=0).tail(2)
d = d.droplevel(1).set_index(d.index.get_level_values(1).factorize()[0] + 1, append=True)

结果:

print(d)

        SomeData
Index           
a     1    some2
      2    some3
b     1    some2
      2    some3
c     1    some2
      2    some3