在添加列参数时,Pandas Pivot表按字母顺序对分类数据(错误地)进行排序

时间:2014-11-04 17:20:41

标签: python pandas

我遇到了Pandas枢轴功能的麻烦。我试图按月和年推销销售数据。数据集如下:

Customer - Sales - Month Name   - Year
a        - 100   - january      - 2013
a        - 120   - january      - 2014
b        - 220   - january      - 2013

为了正确排序月份名称,我添加了一个月份名称作为分类数据的列。

dataset['Month'] = dataset['Month Name'].astype('category')
dataset['Month'].cat.set_categories(['January', 'February', 'March', 'April', 'May', 'June',      'July', 'August', 'September', 'October', 'November', 'December'],inplace=True)
dataset.pop('Month Name')

当我使用该功能时:

pt = dataset.pivot_table(values="Sales", index="Month")

我得到了预期的结果

Month
January      3620302.79
February     3775507.25
March        4543839.69

然而,当我跨越数年和数月时,月份按字母顺序排序。

print dataset.pivot_table(values='Sales', index="Month", columns="Year", aggfunc="sum")
Year            2011        2012        2013        2014
Month                                                   
April      833692.19   954483.28  1210847.85  1210926.61
August     722604.75   735078.52   879905.23  1207211.00
December   779873.51  1053441.71  1243745.73         NaN

感谢您在最后一个代码示例中正确排序月份名称的任何帮助。

谢谢,

1 个答案:

答案 0 :(得分:0)

您在pivot_table之后立即重新索引“月份”。因此按字母顺序排序。幸运的是,您始终可以将dataset['Month']转换为pandas.datetime,并在pivot_table重新索引后将其转换回字符串。

不是最好的解决方案,但这应该可以解决问题(我使用了一些随意的假人):

import pandas as pd
...
# convert dataset['Month'] to pandas.datetime by the time of pivot
# it will reindex by datetime hence the sort order is kept
pivoted = dataset.pivot_table(index=pd.to_datetime(dataset['Month']), columns='Year', \
                              values='Sales', aggfunc='sum')
pivoted
Year        2012  2013  2014
Month                       
2014-01-04   151   295   NaN
2014-02-04   279   128   NaN
2014-03-04   218   244   NaN
2014-04-04   274   152   NaN
2014-05-04   276   NaN   138
2014-06-04   223   NaN   209
...

# then re-set the index back to Month string, "%B" means month string "January" etc.
pivoted.index = [pd.datetime.strftime(m, format='%B') for m in pivoted.index]

pivoted
Year       2012  2013  2014
January     151   295   NaN
February    279   128   NaN
March       218   244   NaN
April       274   152   NaN
May         276   NaN   138
June        223   NaN   209
...

但是你会错过“月份”。索引标签,如果需要,可以将数据集['月']复制到另一列(称为M)并转换为datetime,然后在{{pivot_table上设置多个索引1}}喜欢:

dataset.pivot_table(index=['M', 'Month'], ...)