在sort_index方法

时间:2017-09-11 15:45:36

标签: python pandas indexing

这是我的dataFrame的头部

            McDonald's  Python  CSS  Microsoft Office        day  week day
Jour                                                                      
2017-06-11          87      22   12                31     Sunday         6
2017-06-12          63      38   24                55     Monday         0
2017-06-13          63      41   25                56    Tuesday         1
2017-06-14          73      41   25                55  Wednesday         2
2017-06-15          72      39   24                53   Thursday         3

我在dataFrame上做了一个groupby操作,我得到了:

df_week = df.groupby(["day", "week day"]).mean()
df_week

                    McDonald's     Python        CSS  Microsoft Office
day       week day                                                    
Friday    4          76.076923  36.615385  22.384615         51.769231
Monday    0          68.230769  37.000000  22.230769         54.230769
Saturday  5          87.416667  21.500000  11.416667         30.750000
Sunday    6          90.000000  21.615385  11.000000         30.538462
Thursday  3          69.923077  40.076923  24.615385         55.846154
Tuesday   1          66.230769  39.461538  24.153846         57.000000
Wednesday 2          68.923077  40.000000  24.846154         56.538462

然后我使用工作日索引对我的dataFrame进行了排序。

df_week.sort_index(level="week day", inplace=True)

最后,dataFrame看起来排序很好:

                    McDonald's     Python        CSS  Microsoft Office
day       week day                                                    
Monday    0          68.230769  37.000000  22.230769         54.230769
Tuesday   1          66.230769  39.461538  24.153846         57.000000
Wednesday 2          68.923077  40.000000  24.846154         56.538462
Thursday  3          69.923077  40.076923  24.615385         55.846154
Friday    4          76.076923  36.615385  22.384615         51.769231
Saturday  5          87.416667  21.500000  11.416667         30.750000
Sunday    6          90.000000  21.615385  11.000000         30.538462

但是现在,如果我尝试使用索引值,它们仍然没有排序:

print(df_week.index.levels[0])
print(df_week.index.levels[1])

Index(['Friday', 'Monday', 'Saturday', 'Sunday', 'Thursday', 'Tuesday',
       'Wednesday'],
      dtype='object', name='day')
Int64Index([0, 1, 2, 3, 4, 5, 6], dtype='int64', name='week day')

如果我查看整个MultiIndex对象,很明显,索引标签和索引行是分开存储的。

MultiIndex(levels=[['Friday', 'Monday', 'Saturday', 'Sunday', 'Thursday', 'Tuesday', 'Wednesday'], [0, 1, 2, 3, 4, 5, 6]],
           labels=[[1, 5, 6, 4, 0, 2, 3], [0, 1, 2, 3, 4, 5, 6]],
           names=['day', 'week day'])

因此,我如何以正确的顺序访问索引值?

1 个答案:

答案 0 :(得分:1)

这是因为multiindex levels是一个frozenlist似乎总是被排序并且它们保留了引用。因此,如果您需要订单,请将它们从冻结列表转换为列表。即如果您使用df.index.tolist(),您可以根据数据框查看实际订单。即

df.index.tolist()

[('Monday', 0),
 ('Tuesday', 1),
 ('Wednesday', 2),
 ('Thursday', 3),
 ('Friday', 4),
 ('Saturday', 5),
 ('Sunday', 6)]