我正在使用Pandas groupby来获取每年每月的前n个项目。
month_gr = df.groupby(by=[df.index.year, df.index.month_name(), df['Item Name']])
month_gr['Total'].sum().groupby(level=[0,1], group_keys=False).nlargest(5).sort_index(level=1)
这给我的输出为:
Order Datee Order Datee Item Name
2020 August 12oz w/ lids 10097.50
8oz cup / lids 10246.50
Full fat Milk 32507.00
Grilled Chic WRAP 94166.58
Special Blend Beans 81855.00
July 8oz cup / lids 4801.50
Arwa500ml 6700.41
Full fat Milk 13430.00
Spanish Latte ( R ) 6480.00
Special Blend 500g 29880.00
June Full fat Milk 4740.00
MANAEESH CHEESE 3576.24
Marble cake 4810.65
NUTELLA CHEESECAKE 3350.90
Special Blend Beans 5652.00
September CLUB SANDWICH 1040.10
Cappuccino (Regular) 1404.80
Flat White (Regular) 1162.40
Ginger shot big 2016.00
Spanish Latte ( R ) 926.40
Name: Total, dtype: float64
如果我使用sort_index(level = 1),它将按照字母顺序对值进行排序,从而得到相同的输出。 但是,我想按以下每月订单排序:
cats = ['January', 'February', 'March', 'April','May','June', 'July', 'August','September', 'October', 'November', 'December']
我找到了一种使用pd.CategoricalIndex对月份进行排序的解决方案,但是我不知道如何将其用于多索引。
请解释一下如何根据月份(级别1)或更具体地按年份和月份(级别0和1)对上述数据进行排序。
答案 0 :(得分:1)
DataFrame短路的示例。
df = pd.DataFrame({
'year': [2020, 2020, 2020, 2020, 2020, 2020],
'month_name': ['August', 'August', 'August', 'July', 'July', 'September'],
'Item Name': ['a', 'b', 'c', 'd', 'e', 'f'],
'Total': [1, 2, 3, 4, 5, 6]
})
month_gr = df.groupby(by=['year', 'month_name', 'Item Name'])['Total'].sum()
print(month_gr)
打印:
year month_name Item Name
2020 August a 1
b 2
c 3
July d 4
e 5
September f 6
Name: Total, dtype: int64
然后您可以重置索引,设置分类列,对值进行排序并重新设置索引:
month_gr = month_gr.reset_index()
cats = ['January', 'February', 'March', 'April','May','June', 'July', 'August','September', 'October', 'November', 'December']
month_gr['month_name'] = pd.Categorical(month_gr['month_name'], cats, ordered=True)
print(month_gr.sort_values(by=['year', 'month_name']).set_index(['year', 'month_name', 'Item Name']))
打印:
Total
year month_name Item Name
2020 July d 4
e 5
August a 1
b 2
c 3
September f 6