我遇到了Pandas枢轴功能的麻烦。我试图按月和年推销销售数据。数据集如下:
Customer - Sales - Month Name - Year
a - 100 - january - 2013
a - 120 - january - 2014
b - 220 - january - 2013
为了正确排序月份名称,我添加了一个月份名称作为分类数据的列。
dataset['Month'] = dataset['Month Name'].astype('category')
dataset['Month'].cat.set_categories(['January', 'February', 'March', 'April', 'May', 'June', 'July', 'August', 'September', 'October', 'November', 'December'],inplace=True)
dataset.pop('Month Name')
当我使用该功能时:
pt = dataset.pivot_table(values="Sales", index="Month")
我得到了预期的结果
Month
January 3620302.79
February 3775507.25
March 4543839.69
然而,当我跨越数年和数月时,月份按字母顺序排序。
print dataset.pivot_table(values='Sales', index="Month", columns="Year", aggfunc="sum")
Year 2011 2012 2013 2014
Month
April 833692.19 954483.28 1210847.85 1210926.61
August 722604.75 735078.52 879905.23 1207211.00
December 779873.51 1053441.71 1243745.73 NaN
感谢您在最后一个代码示例中正确排序月份名称的任何帮助。
谢谢,
谢
答案 0 :(得分:0)
您在pivot_table
之后立即重新索引“月份”。因此按字母顺序排序。幸运的是,您始终可以将dataset['Month']
转换为pandas.datetime
,并在pivot_table
重新索引后将其转换回字符串。
不是最好的解决方案,但这应该可以解决问题(我使用了一些随意的假人):
import pandas as pd
...
# convert dataset['Month'] to pandas.datetime by the time of pivot
# it will reindex by datetime hence the sort order is kept
pivoted = dataset.pivot_table(index=pd.to_datetime(dataset['Month']), columns='Year', \
values='Sales', aggfunc='sum')
pivoted
Year 2012 2013 2014
Month
2014-01-04 151 295 NaN
2014-02-04 279 128 NaN
2014-03-04 218 244 NaN
2014-04-04 274 152 NaN
2014-05-04 276 NaN 138
2014-06-04 223 NaN 209
...
# then re-set the index back to Month string, "%B" means month string "January" etc.
pivoted.index = [pd.datetime.strftime(m, format='%B') for m in pivoted.index]
pivoted
Year 2012 2013 2014
January 151 295 NaN
February 279 128 NaN
March 218 244 NaN
April 274 152 NaN
May 276 NaN 138
June 223 NaN 209
...
但是你会错过“月份”。索引标签,如果需要,可以将数据集['月']复制到另一列(称为M
)并转换为datetime
,然后在{{pivot_table
上设置多个索引1}}喜欢:
dataset.pivot_table(index=['M', 'Month'], ...)