我试图在pandas数据框中重新排序/ swaplevel / pivot / something列。 这些列是MultiIndex,但我无法找到我想要的酱汁。
我的multiIndex中变化最快的列是月份,但我希望它是最慢的变量列。
如果你想亲自尝试一下,我有一个nbviewer笔记本: http://nbviewer.ipython.org/gist/flamingbear/4cfac24c80fe34a67474
我有什么:
+-------------------------------------------------------------------+
|+-----+------+------+-----+------+-----+-----+------+-----+-----+ |
|| |weight |extent |rank ||
|+-----+------+------+-----+------+-----+-----+------+-----+-----+ |
||month|'1Jan'|'Feb' |'Mar'|'1Jan'|'Feb'|'Mar'|'1Jan'|'Feb'|'Mar'| |
|+-----+------+------+-----+------+-----+-----+------+-----+-----+ |
||year | | | | | | | | | | |
|+-----+------+------+-----+------+-----+-----+------+-----+-----+ |
||2000 |45.1 |46.1 |25.1 |13.442|14.94|15.02|13 |17 |14 | |
|+-----+------+------+-----+------+-----+-----+------+-----+-----+ |
||2001 |85.0 |16.0 |49.0 |13.380|14.81|15.14|12 |15 |17 | |
|+-----+------+------+-----+------+-----+-----+------+-----+-----+ |
||2002 |90.0 |33.0 |82.0 |13.590|15.13|14.88|15 |22 |10 | |
|+-----+------+------+-----+------+-----+-----+------+-----+-----+ |
||2003 |47.0 |34.0 |78.0 |13.640|14.83|15.27|17 |16 |22 | |
|+-----+------+------+-----+------+-----+-----+------+-----+-----+ |
+-------------------------------------------------------------------+
我想要什么
+------------------------------------------------------------------+
|+-----+------+------+----+------+------+-----+------+------+----+ |
||month|1Jan |Feb |Mar ||
|+-----+------+------+----+------+------+-----+------+------+----+ |
|| |weight|extent|rank|weight|extent|rank |weight|extent|rank| |
|+-----+------+------+----+------+------+-----+------+------+----+ |
||year | | | | | | | | | | |
|+-----+------+------+----+------+------+-----+------+------+----+ |
||2000 |45.1 |13.442|13 |46.1 |14.94 |17 | 25.1 |15.02 |14 | |
|+-----+------+------+----+------+------+-----+------+------+----+ |
||2001 |85.0 |13.380|12 |16.0 |14.81 |15 | 49.0 |15.14 |17 | |
|+-----+------+------+----+------+------+-----+------+------+----+ |
||2002 |90.0 |13.590|15 |33.0 |15.13 |22 | 82.0 |14.88 |10 | |
|+-----+------+------+----+------+------+-----+------+------+----+ |
||2003 |47.0 |13.640|17 |34.0 |14.83 |16 | 78.0 |15.27 |22 | |
|+-----+------+------+-----------+------+-----+------+------+----+ |
+------------------------------------------------------------------+
任何帮助将不胜感激。我可以使用我原来的DataFrame,但写入具有所需顺序的CSV将非常棒。
提前致谢, 马特
答案 0 :(得分:22)
您的列是MultiIndex。您需要使用通过交换现有级别创建的新MultiIndex重新分配DataFrame的列:
SELECT
et.Term
,COUNT(et.employeeid) AS 'Total Enrolled'
,SUM(et.retained) AS 'Retained'
,SUM(et.EnrollButSwitchedDept) AS 'EnrollButSwitched'
,SUM(et.NotEnrolled) AS 'Not Enrolled'
,SUM(et.Graduated) AS 'Graduated'
,CAST(SUM(et.retained) * 100.0 / COUNT(et.EmployeeID) AS NUMERIC(10,2)) AS 'Retained %'
FROM #EnrollmentTypes et
WHERE 1 = 1
GROUP BY et.Term
然后您可以导出到csv:
df.columns = df.columns.swaplevel(0, 1)
df.sortlevel(0, axis=1, inplace=True)
>>> df
month '1Jan' 'Feb' 'Mar'
weight extent rank weight extent rank weight extent rank
year
2000 45.1 13.442 13 46.1 14.94 17 25.1 15.02 14
2001 85.0 13.380 12 16.0 14.81 15 49.0 15.14 17
2002 90.0 13.590 15 33.0 15.13 22 82.0 14.88 10
2003 47.0 13.640 17 34.0 14.83 16 78.0 15.27 22
修改强>
根据以下@Silas的评论,df.to_csv(filename)
已被弃用。相反,使用:
sortlevel
答案 1 :(得分:0)
由于级别索引不再是强制性的,因此您可以使用更简单的方法来实现多索引数据框的级别交换:
df = df.swaplevel(axis='columns')