重新排序Stacked DataFrame

时间:2015-04-28 20:49:27

标签: python pandas

我正在尝试重新排序堆叠的数据帧。例如,我有:

import numpy as np
testdf = pd.DataFrame(np.random.randn(5,4), index=range(1,6), columns = ['Eric','Jane','Mary','Don'])
testdf.stack()

我的输出是这样的:

1  Eric   -0.301206
   Jane    1.327379
   Mary    1.066828
   Don    -0.429380
2  Eric    0.196671
   Jane   -1.232447
   Mary    1.139221
   Don     1.441183
3  Eric   -0.912282
   Jane   -0.204741
   Mary   -0.802078
   Don     0.149269
4  Eric   -0.168387
   Jane    1.608617
   Mary    2.237823
   Don     0.973450
5  Eric   -0.290492
   Jane   -0.374205
   Mary    0.986653
   Don     1.584820
dtype: float64

有没有办法更改这些名称的顺序,而无需重新排列原始数据框的列?我的最终目标是告诉大熊猫Eric, Don, Mary, Jane是我以后所有输出的理想顺序,尽管它没有按字母顺序排序,类似于R中的levels函数?

我想做什么谢谢!

1 个答案:

答案 0 :(得分:2)

在索引上使用set_levels重新排序值:

In [67]:

t.index.set_levels([[1,2,3,4,5],['Eric', 'Don', 'Mary', 'Jane']], inplace=True)
t
Out[67]:
1  Eric    1.139358
   Don    -0.368389
   Mary   -1.907364
   Jane    0.444930
2  Eric   -0.113019
   Don    -0.823055
   Mary   -1.397237
   Jane    0.268164
3  Eric   -1.246184
   Don     0.356804
   Mary   -0.286919
   Jane    0.845538
4  Eric   -0.674448
   Don     0.903695
   Mary    0.873403
   Jane   -1.321770
5  Eric    1.308402
   Don    -1.901295
   Mary    0.122430
   Jane    0.110339
dtype: float64

来自文档字符串,(还有一个简短的解释online):

Signature: t.index.set_levels(levels, level=None, inplace=False, verify_integrity=True)
Docstring:
Set new levels on MultiIndex. Defaults to returning
new index.

Parameters
----------
levels : sequence or list of sequence
    new level(s) to apply
level : int or level name, or sequence of int / level names (default None)
    level(s) to set (None for all levels)
inplace : bool
    if True, mutates in place
verify_integrity : bool (default True)
    if True, checks that levels and labels are compatible

Returns
-------
new index (of same type and class...etc)


Examples
--------
>>> idx = MultiIndex.from_tuples([(1, u'one'), (1, u'two'),
                                  (2, u'one'), (2, u'two')],
                                  names=['foo', 'bar'])
>>> idx.set_levels([['a','b'], [1,2]])
MultiIndex(levels=[[u'a', u'b'], [1, 2]],
           labels=[[0, 0, 1, 1], [0, 1, 0, 1]],
           names=[u'foo', u'bar'])
>>> idx.set_levels(['a','b'], level=0)
MultiIndex(levels=[[u'a', u'b'], [u'one', u'two']],
           labels=[[0, 0, 1, 1], [0, 1, 0, 1]],
           names=[u'foo', u'bar'])
>>> idx.set_levels(['a','b'], level='bar')
MultiIndex(levels=[[1, 2], [u'a', u'b']],
           labels=[[0, 0, 1, 1], [0, 1, 0, 1]],
           names=[u'foo', u'bar'])
>>> idx.set_levels([['a','b'], [1,2]], level=[0,1])
MultiIndex(levels=[[u'a', u'b'], [1, 2]],
           labels=[[0, 0, 1, 1], [0, 1, 0, 1]],
           names=[u'foo', u'bar'])

<强>更新

如果您的pandas版本为0.15.0或更高,那么set_levels会接受level arg,这样可以更清晰地调整其中一个级别:

In [244]:

testdf.index.set_levels(['Eric', 'Don', 'Mary', 'Jane'], level=1, inplace=True)
testdf
Out[244]:
1  Eric   -0.026484
   Don     0.223672
   Mary    0.266461
   Jane    1.121323
2  Eric   -0.250781
   Don    -1.079661
   Mary    0.525879
   Jane    1.692250
3  Eric   -1.337944
   Don     0.765228
   Mary   -1.297232
   Jane    1.121497
4  Eric    2.611441
   Don     0.805786
   Mary   -0.174193
   Jane   -0.371906
5  Eric   -0.084597
   Don     1.794861
   Mary    0.766524
   Jane    0.150359
dtype: float64