Question

我有一个多索引DataFrame，其名称附加到列级别。我希望能够轻松地改变列周围的列，以便它们匹配用户指定的顺序。由于这是在管道中，我无法使用this recommended solution并在创建时正确排序。

我有一个看起来像（

）的数据表

Experiment           BASE           IWWGCW         IWWGDW
Lead Time                24     48      24     48      24     48
2010-11-27 12:00:00   0.997  0.991   0.998  0.990   0.998  0.990
2010-11-28 12:00:00   0.998  0.987   0.997  0.990   0.997  0.990
2010-11-29 12:00:00   0.997  0.992   0.997  0.992   0.997  0.992
2010-11-30 12:00:00   0.997  0.987   0.997  0.987   0.997  0.987
2010-12-01 12:00:00   0.996  0.986   0.996  0.986   0.996  0.986

我想接受像['IWWGCW', 'IWWGDW', 'BASE']这样的列表，并将其重新排序为：

Experiment           IWWGCW         IWWGDW         BASE           
Lead Time                24     48      24     48      24     48  
2010-11-27 12:00:00   0.998  0.990   0.998  0.990   0.997  0.991  
2010-11-28 12:00:00   0.997  0.990   0.997  0.990   0.998  0.987  
2010-11-29 12:00:00   0.997  0.992   0.997  0.992   0.997  0.992  
2010-11-30 12:00:00   0.997  0.987   0.997  0.987   0.997  0.987  
2010-12-01 12:00:00   0.996  0.986   0.996  0.986   0.996  0.986

需要注意的是，我并不总是知道“实验”的级别。我试过了（其中df是上面显示的多索引框架）

df2 = df.reindex_axis(['IWWGCW', 'IWWGDW', 'BASE'], axis=1, level='Experiment')

但这似乎不起作用 - 它已成功完成，但返回的DataFrame的列顺序未更改。

我的解决方法是拥有如下功能：

def reorder_columns(frame, column_name, new_order):
    """Shuffle the specified columns of the frame to match new_order."""

    index_level  = frame.columns.names.index(column_name)
    new_position = lambda t: new_order.index(t[index_level])
    new_index    = sorted(frame.columns, key=new_position)
    new_frame    = frame.reindex_axis(new_index, axis=1)
    return new_frame

reorder_columns(df, 'Experiment', ['IWWGCW', 'IWWGDW', 'BASE'])符合我的期望，但感觉我正在做额外的工作。有更简单的方法吗？

Answer 1

有一种非常简单的方法：只需创建一个基于原始数据框的新数据框，并使用正确的多索引列顺序：

multi_tuples = [('IWWGCW',24), ('IWWGCW',48), ('IWWGDW',24), ('IWWGDW',48)
    , ('BASE',24), ('BASE',48)]

multi_cols = pd.MultiIndex.from_tuples(multi_tuples, names=['Experiment', 'Lead Time'])

df_ordered_multi_cols = pd.DataFrame(df_ori, columns=multi_cols)

Answer 2

这是为我工作的最简单的方法：

1-为您选择的级别创建一个列表，并按所需顺序排列各列；

2-重新索引您的列并从该列表中创建一个MultiIndex对象，请记住，这将返回一个元组；

3-使用MultiIndex对象重新排列您的DataFrame。

cols = ['IWWGCW', 'IWWGDW', 'BASE']

new_cols = df.columns.reindex(cols, level = 0)

df.reindex(columns= new_cols[0]) #new_cols is a single item tuple

一行：

df.reindex(columns= df.columns.reindex(['IWWGCW', 'IWWGDW', 'BASE'], 
level = 0)[0])

瞧瞧

Answer 3

我不知道有什么不在场的。创建了一个关于它的增强票：

http://github.com/pydata/pandas/issues/1864

Answer 4

comment by andrew_reece应该被接受。只需使用reindex()。

从github issue复制并粘贴：

state

如何在特定级别重新排序多索引数据框列

4 个答案: