带有销售数据的Python交叉表

时间:2019-02-07 07:39:58

标签: python pandas crosstab

我有此数据:

Date        Month    ProductCategory        Sales
1/1/2009    2009-Jan    Clothing            1755
1/1/2009    2009-Jan    Grossery            524
1/1/2009    2009-Jan    Toys                936
2/1/2009    2009-Feb    Clothing            1729
2/1/2009    2009-Feb    Grossery            496
2/1/2009    2009-Feb    Toys

我想要它为这张桌子:

 Date      Month     Clothing Sales Grossery Sales  Toys Sales  Total Sales
 1/1/2009   2009-Jan    1755            524             936         3215
 2/1/2009   2009-Feb    1729            496                         2225

我尝试了以下代码:

train_cross =pd.crosstab([df_train.Date,df_train.Sales],
                          df_train.ProductCategory, margins=True)
                .rename_axis(None,1)
                .reset_index()train_cross
                .head()

我得到了这些结果:

  Date          Sales   Grossery    Toys    Clothing    All
  1/1/2009      524     1           0           0       1
  1/1/2009      936     0           1           0       1
  1/1/2009      1755    0           0           1       1
  2/1/2009      496     1           0           0       1
  2/1/2009      1729    0           0           1       1

我哪里错了?

2 个答案:

答案 0 :(得分:3)

您可以使用df.pivot_table()

df_new= df.pivot_table(index=['Date','Month'],columns='ProductCategory',values='Sales').\
reset_index().rename_axis(None,1)
df_new['Total_Sales']=df_new.iloc[:,2:].sum(axis=1)
print(df_new)

       Date     Month  Clothing  Grossery   Toys  Total_Sales
0  1/1/2009  2009-Jan    1755.0     524.0  936.0       3215.0
1  2/1/2009  2009-Feb    1729.0     496.0    NaN       2225.0

答案 1 :(得分:3)

按列DateMonth将第一个列表更改为新索引,将Sales添加到values,添加聚合函数并指定总列的列名称:

df = pd.crosstab(index=[df_train.Date,df_train.Month],
                 columns=df_train.ProductCategory, 
                 values=df_train.Sales, 
                 aggfunc='sum', 
                 margins=True,
                 margins_name='Total Sales')
print (df)
ProductCategory       Clothing  Grossery   Toys  Total Sales
Date        Month                                           
1/1/2009    2009-Jan    1755.0     524.0  936.0       3215.0
2/1/2009    2009-Feb    1729.0     496.0    0.0       2225.0
Total Sales             3484.0    1020.0  936.0       5440.0

如有必要,请删除最后一行并将MultiIndex转换为列:

df = df.iloc[:-1].reset_index().rename_axis(None, axis=1)
print (df)

       Date     Month  Clothing  Grossery   Toys  Total Sales
0  1/1/2009  2009-Jan    1755.0     524.0  936.0       3215.0
1  2/1/2009  2009-Feb    1729.0     496.0    0.0       2225.0

没有margins的{​​{3}}的解决方案:

df = df_train.pivot_table(index=['Date','Month'], 
                          columns='ProductCategory', 
                          values='Sales', aggfunc='sum')
df['Total Sales'] = df.sum(axis=1)
df = df.reset_index().rename_axis(None, axis=1)
print (df)
       Date     Month  Clothing  Grossery   Toys  Total Sales
0  1/1/2009  2009-Jan    1755.0     524.0  936.0       3215.0
1  2/1/2009  2009-Feb    1729.0     496.0    0.0       2225.0

以及margins的解决方案:

df = df_train.pivot_table(index=['Date','Month'],
                          columns='ProductCategory', 
                          values='Sales', 
                          aggfunc='sum', 
                          margins=True,
                          margins_name='Total Sales')
print (df)
ProductCategory       Clothing  Grossery   Toys  Total Sales
Date        Month                                           
1/1/2009    2009-Jan    1755.0     524.0  936.0       3215.0
2/1/2009    2009-Feb    1729.0     496.0    0.0       2225.0
Total Sales             3484.0    1020.0  936.0       5440.0

df = df.iloc[:-1].reset_index().rename_axis(None, axis=1)