Question

我正在尝试使用Pandas对分组的数据进行排序我的代码：

 df = pd.read_csv("./data3.txt")
 grouped = df.groupby(['cust','year','month'])['price'].count()
 print(grouped)

我的数据：

cust,year,month,price
astor,2015,Jan,100
astor,2015,Jan,122
astor,2015,Feb,200
astor,2016,Feb,234
astor,2016,Feb,135
astor,2016,Mar,169
astor,2017,Mar,321
astor,2017,Apr,245
tor,2015,Jan,100
tor,2015,Feb,122
tor,2015,Feb,200
tor,2016,Mar,234
tor,2016,Apr,135
tor,2016,May,169
tor,2017,Mar,321
tor,2017,Apr,245

这是我的结果。

 cust  year  month
    astor  2015  Feb      1
                 Jan      2
           2016  Feb      2
                 Mar      1
           2017  Apr      1
                 Mar      1
    tor    2015  Feb      2
                 Jan      1
           2016  Apr      1
                 Mar      1
                 May      1
           2017  Apr      1
                 Mar      1

如何获取按月排序的输出？

Answer 1

将参数sort=False添加到groupby：

grouped = df.groupby(['cust','year','month'], sort=False)['price'].count()
print (grouped)
cust   year  month
astor  2015  Jan      2
             Feb      1
       2016  Feb      2
             Mar      1
       2017  Mar      1
             Apr      1
tor    2015  Jan      1
             Feb      2
       2016  Mar      1
             Apr      1
             May      1
       2017  Mar      1
             Apr      1
Name: price, dtype: int64

如果不可能，请使用第一个解决方案，将月份转换为日期时间，最后转换回：

df['month'] = pd.to_datetime(df['month'], format='%b')
f = lambda x: x.strftime('%b')
grouped = df.groupby(['cust','year','month'])['price'].count().rename(f, level=2)
print (grouped)
cust   year  month
astor  2015  Jan      2
             Feb      1
       2016  Feb      2
             Mar      1
       2017  Mar      1
             Apr      1
tor    2015  Jan      1
             Feb      2
       2016  Mar      1
             Apr      1
             May      1
       2017  Mar      1
             Apr      1
Name: price, dtype: int64

在熊猫中对分组数据进行排序

1 个答案: