如何更改pandas MultiIndex列的顺序/分组/级别?

时间:2015-04-24 23:44:22

标签: python pandas

我试图在pandas数据框中重新排序/ swaplevel / pivot / something列。 这些列是MultiIndex,但我无法找到我想要的酱汁。

我的multiIndex中变化最快的列是月份,但我希望它是最慢的变量列。

如果你想亲自尝试一下,我有一个nbviewer笔记本: http://nbviewer.ipython.org/gist/flamingbear/4cfac24c80fe34a67474

我有什么:

+-------------------------------------------------------------------+
|+-----+------+------+-----+------+-----+-----+------+-----+-----+  |
||     |weight             |extent            |rank                ||
|+-----+------+------+-----+------+-----+-----+------+-----+-----+  |
||month|'1Jan'|'Feb' |'Mar'|'1Jan'|'Feb'|'Mar'|'1Jan'|'Feb'|'Mar'|  |
|+-----+------+------+-----+------+-----+-----+------+-----+-----+  |
||year |      |      |     |      |     |     |      |     |     |  |
|+-----+------+------+-----+------+-----+-----+------+-----+-----+  |
||2000 |45.1  |46.1  |25.1 |13.442|14.94|15.02|13    |17   |14   |  |
|+-----+------+------+-----+------+-----+-----+------+-----+-----+  |
||2001 |85.0  |16.0  |49.0 |13.380|14.81|15.14|12    |15   |17   |  |
|+-----+------+------+-----+------+-----+-----+------+-----+-----+  |
||2002 |90.0  |33.0  |82.0 |13.590|15.13|14.88|15    |22   |10   |  |
|+-----+------+------+-----+------+-----+-----+------+-----+-----+  |
||2003 |47.0  |34.0  |78.0 |13.640|14.83|15.27|17    |16   |22   |  |
|+-----+------+------+-----+------+-----+-----+------+-----+-----+  |
+-------------------------------------------------------------------+

我想要什么

+------------------------------------------------------------------+
|+-----+------+------+----+------+------+-----+------+------+----+ |
||month|1Jan              |Feb                |Mar                ||
|+-----+------+------+----+------+------+-----+------+------+----+ |
||     |weight|extent|rank|weight|extent|rank |weight|extent|rank| |
|+-----+------+------+----+------+------+-----+------+------+----+ |
||year |      |      |    |      |      |     |      |      |    | |
|+-----+------+------+----+------+------+-----+------+------+----+ |
||2000 |45.1  |13.442|13  |46.1  |14.94 |17   | 25.1 |15.02 |14  | |
|+-----+------+------+----+------+------+-----+------+------+----+ |
||2001 |85.0  |13.380|12  |16.0  |14.81 |15   | 49.0 |15.14 |17  | |
|+-----+------+------+----+------+------+-----+------+------+----+ |
||2002 |90.0  |13.590|15  |33.0  |15.13 |22   | 82.0 |14.88 |10  | |
|+-----+------+------+----+------+------+-----+------+------+----+ |
||2003 |47.0  |13.640|17  |34.0  |14.83 |16   | 78.0 |15.27 |22  | |
|+-----+------+------+-----------+------+-----+------+------+----+ |
+------------------------------------------------------------------+

任何帮助将不胜感激。我可以使用我原来的DataFrame,但写入具有所需顺序的CSV将非常棒。

提前致谢, 马特

2 个答案:

答案 0 :(得分:22)

您的列是MultiIndex。您需要使用通过交换现有级别创建的新MultiIndex重新分配DataFrame的列:

SELECT
 et.Term
,COUNT(et.employeeid) AS 'Total Enrolled'
,SUM(et.retained) AS 'Retained'
,SUM(et.EnrollButSwitchedDept) AS 'EnrollButSwitched'
,SUM(et.NotEnrolled) AS 'Not Enrolled'
,SUM(et.Graduated) AS 'Graduated'
,CAST(SUM(et.retained) * 100.0 / COUNT(et.EmployeeID) AS NUMERIC(10,2)) AS 'Retained %'
FROM #EnrollmentTypes et
WHERE 1 = 1
GROUP BY et.Term

然后您可以导出到csv:

df.columns = df.columns.swaplevel(0, 1)
df.sortlevel(0, axis=1, inplace=True)
>>> df

month   '1Jan'                 'Feb'                 'Mar'              
        weight  extent  rank  weight  extent  rank  weight  extent  rank
year                                                                    
2000      45.1  13.442    13    46.1   14.94    17    25.1   15.02    14
2001      85.0  13.380    12    16.0   14.81    15    49.0   15.14    17
2002      90.0  13.590    15    33.0   15.13    22    82.0   14.88    10
2003      47.0  13.640    17    34.0   14.83    16    78.0   15.27    22

修改

根据以下@Silas的评论,df.to_csv(filename) 已被弃用。相反,使用:

sortlevel

答案 1 :(得分:0)

由于级别索引不再是强制性的,因此您可以使用更简单的方法来实现多索引数据框的级别交换:

df = df.swaplevel(axis='columns')