熊猫:如何重新索引MultiIndex级别?

时间:2018-04-17 20:49:01

标签: python python-3.x pandas indexing

在按其中一个级别排序后,如何重新编号MultiIndex级别?这是排序后的DataFrame:

+--------+---+------+
|        |   | text |
+--------+---+------+
| letter |   |      |
+--------+---+------+
| a      | 0 | blah |
+--------+---+------+
|        | 3 | blah |
+--------+---+------+
|        | 6 | blah |
+--------+---+------+
| b      | 1 | blah |
+--------+---+------+
|        | 4 | blah |
+--------+---+------+
|        | 7 | blah |
+--------+---+------+
| c      | 2 | blah |
+--------+---+------+
|        | 5 | blah |
+--------+---+------+
|        | 8 | blah |
+--------+---+------+

这就是我想要的(但可能将原始索引留在自己的专栏中):

+--------+---+------+
|        |   | text |
+--------+---+------+
| letter |   |      |
+--------+---+------+
| a      | 0 | blah |
+--------+---+------+
|        | 1 | blah |
+--------+---+------+
|        | 2 | blah |
+--------+---+------+
| b      | 0 | blah |
+--------+---+------+
|        | 1 | blah |
+--------+---+------+
|        | 2 | blah |
+--------+---+------+
| c      | 0 | blah |
+--------+---+------+
|        | 1 | blah |
+--------+---+------+
|        | 2 | blah |
+--------+---+------+

我试过寻找答案,尝试编写不同的东西,但我很难过。

重现上面第一个表的代码:

import pandas as pd
df = pd.DataFrame({'letter': ['a', 'b', 'c'] * 3, 'text': ['blah'] * 9})
df.set_index(keys='letter', append=True, inplace=True)
df = df.reorder_levels(order=[1, 0])
df.sort_index(level=0, inplace=True)
print(df)

2 个答案:

答案 0 :(得分:2)

您可以查看cumcount

df=df.assign(yourindex=df.groupby('letter').cumcount()).set_index(['letter','yourindex']).sort_index(level=[0,1])
df
Out[861]: 
                  text
letter yourindex      
a      0          blah
       1          blah
       2          blah
b      0          blah
       1          blah
       2          blah
c      0          blah
       1          blah
       2          blah

答案 1 :(得分:1)

这就是我的所作所为:

df["new_index"] = df.groupby("letter").cumcount()
df

这会给你:

          text  new_index
letter                   
a      0  blah          0
       3  blah          1
       6  blah          2
b      1  blah          0
       4  blah          1
       7  blah          2
c      2  blah          0
       5  blah          1
       8  blah          2

然后,您可以重置索引:

df.reset_index().set_index(["letter","new_index"])

                  level_1  text
letter new_index               
a      0                0  blah
       1                3  blah
       2                6  blah
b      0                1  blah
       1                4  blah
       2                7  blah
c      0                2  blah
       1                5  blah
       2                8  blah