我正在尝试重新排序堆叠的数据帧。例如,我有:
import numpy as np
testdf = pd.DataFrame(np.random.randn(5,4), index=range(1,6), columns = ['Eric','Jane','Mary','Don'])
testdf.stack()
我的输出是这样的:
1 Eric -0.301206
Jane 1.327379
Mary 1.066828
Don -0.429380
2 Eric 0.196671
Jane -1.232447
Mary 1.139221
Don 1.441183
3 Eric -0.912282
Jane -0.204741
Mary -0.802078
Don 0.149269
4 Eric -0.168387
Jane 1.608617
Mary 2.237823
Don 0.973450
5 Eric -0.290492
Jane -0.374205
Mary 0.986653
Don 1.584820
dtype: float64
有没有办法更改这些名称的顺序,而无需重新排列原始数据框的列?我的最终目标是告诉大熊猫Eric, Don, Mary, Jane
是我以后所有输出的理想顺序,尽管它没有按字母顺序排序,类似于R中的levels
函数?
我想做什么谢谢!
答案 0 :(得分:2)
在索引上使用set_levels
重新排序值:
In [67]:
t.index.set_levels([[1,2,3,4,5],['Eric', 'Don', 'Mary', 'Jane']], inplace=True)
t
Out[67]:
1 Eric 1.139358
Don -0.368389
Mary -1.907364
Jane 0.444930
2 Eric -0.113019
Don -0.823055
Mary -1.397237
Jane 0.268164
3 Eric -1.246184
Don 0.356804
Mary -0.286919
Jane 0.845538
4 Eric -0.674448
Don 0.903695
Mary 0.873403
Jane -1.321770
5 Eric 1.308402
Don -1.901295
Mary 0.122430
Jane 0.110339
dtype: float64
来自文档字符串,(还有一个简短的解释online):
Signature: t.index.set_levels(levels, level=None, inplace=False, verify_integrity=True)
Docstring:
Set new levels on MultiIndex. Defaults to returning
new index.
Parameters
----------
levels : sequence or list of sequence
new level(s) to apply
level : int or level name, or sequence of int / level names (default None)
level(s) to set (None for all levels)
inplace : bool
if True, mutates in place
verify_integrity : bool (default True)
if True, checks that levels and labels are compatible
Returns
-------
new index (of same type and class...etc)
Examples
--------
>>> idx = MultiIndex.from_tuples([(1, u'one'), (1, u'two'),
(2, u'one'), (2, u'two')],
names=['foo', 'bar'])
>>> idx.set_levels([['a','b'], [1,2]])
MultiIndex(levels=[[u'a', u'b'], [1, 2]],
labels=[[0, 0, 1, 1], [0, 1, 0, 1]],
names=[u'foo', u'bar'])
>>> idx.set_levels(['a','b'], level=0)
MultiIndex(levels=[[u'a', u'b'], [u'one', u'two']],
labels=[[0, 0, 1, 1], [0, 1, 0, 1]],
names=[u'foo', u'bar'])
>>> idx.set_levels(['a','b'], level='bar')
MultiIndex(levels=[[1, 2], [u'a', u'b']],
labels=[[0, 0, 1, 1], [0, 1, 0, 1]],
names=[u'foo', u'bar'])
>>> idx.set_levels([['a','b'], [1,2]], level=[0,1])
MultiIndex(levels=[[u'a', u'b'], [1, 2]],
labels=[[0, 0, 1, 1], [0, 1, 0, 1]],
names=[u'foo', u'bar'])
<强>更新强>
如果您的pandas版本为0.15.0
或更高,那么set_levels
会接受level
arg,这样可以更清晰地调整其中一个级别:
In [244]:
testdf.index.set_levels(['Eric', 'Don', 'Mary', 'Jane'], level=1, inplace=True)
testdf
Out[244]:
1 Eric -0.026484
Don 0.223672
Mary 0.266461
Jane 1.121323
2 Eric -0.250781
Don -1.079661
Mary 0.525879
Jane 1.692250
3 Eric -1.337944
Don 0.765228
Mary -1.297232
Jane 1.121497
4 Eric 2.611441
Don 0.805786
Mary -0.174193
Jane -0.371906
5 Eric -0.084597
Don 1.794861
Mary 0.766524
Jane 0.150359
dtype: float64