我有此数据:
Date Month ProductCategory Sales
1/1/2009 2009-Jan Clothing 1755
1/1/2009 2009-Jan Grossery 524
1/1/2009 2009-Jan Toys 936
2/1/2009 2009-Feb Clothing 1729
2/1/2009 2009-Feb Grossery 496
2/1/2009 2009-Feb Toys
我想要它为这张桌子:
Date Month Clothing Sales Grossery Sales Toys Sales Total Sales
1/1/2009 2009-Jan 1755 524 936 3215
2/1/2009 2009-Feb 1729 496 2225
我尝试了以下代码:
train_cross =pd.crosstab([df_train.Date,df_train.Sales],
df_train.ProductCategory, margins=True)
.rename_axis(None,1)
.reset_index()train_cross
.head()
我得到了这些结果:
Date Sales Grossery Toys Clothing All
1/1/2009 524 1 0 0 1
1/1/2009 936 0 1 0 1
1/1/2009 1755 0 0 1 1
2/1/2009 496 1 0 0 1
2/1/2009 1729 0 0 1 1
我哪里错了?
答案 0 :(得分:3)
您可以使用df.pivot_table()
:
df_new= df.pivot_table(index=['Date','Month'],columns='ProductCategory',values='Sales').\
reset_index().rename_axis(None,1)
df_new['Total_Sales']=df_new.iloc[:,2:].sum(axis=1)
print(df_new)
Date Month Clothing Grossery Toys Total_Sales
0 1/1/2009 2009-Jan 1755.0 524.0 936.0 3215.0
1 2/1/2009 2009-Feb 1729.0 496.0 NaN 2225.0
答案 1 :(得分:3)
按列Date
和Month
将第一个列表更改为新索引,将Sales
添加到values
,添加聚合函数并指定总列的列名称:
df = pd.crosstab(index=[df_train.Date,df_train.Month],
columns=df_train.ProductCategory,
values=df_train.Sales,
aggfunc='sum',
margins=True,
margins_name='Total Sales')
print (df)
ProductCategory Clothing Grossery Toys Total Sales
Date Month
1/1/2009 2009-Jan 1755.0 524.0 936.0 3215.0
2/1/2009 2009-Feb 1729.0 496.0 0.0 2225.0
Total Sales 3484.0 1020.0 936.0 5440.0
如有必要,请删除最后一行并将MultiIndex
转换为列:
df = df.iloc[:-1].reset_index().rename_axis(None, axis=1)
print (df)
Date Month Clothing Grossery Toys Total Sales
0 1/1/2009 2009-Jan 1755.0 524.0 936.0 3215.0
1 2/1/2009 2009-Feb 1729.0 496.0 0.0 2225.0
没有margins
的{{3}}的解决方案:
df = df_train.pivot_table(index=['Date','Month'],
columns='ProductCategory',
values='Sales', aggfunc='sum')
df['Total Sales'] = df.sum(axis=1)
df = df.reset_index().rename_axis(None, axis=1)
print (df)
Date Month Clothing Grossery Toys Total Sales
0 1/1/2009 2009-Jan 1755.0 524.0 936.0 3215.0
1 2/1/2009 2009-Feb 1729.0 496.0 0.0 2225.0
以及margins
的解决方案:
df = df_train.pivot_table(index=['Date','Month'],
columns='ProductCategory',
values='Sales',
aggfunc='sum',
margins=True,
margins_name='Total Sales')
print (df)
ProductCategory Clothing Grossery Toys Total Sales
Date Month
1/1/2009 2009-Jan 1755.0 524.0 936.0 3215.0
2/1/2009 2009-Feb 1729.0 496.0 0.0 2225.0
Total Sales 3484.0 1020.0 936.0 5440.0
df = df.iloc[:-1].reset_index().rename_axis(None, axis=1)