大熊猫包括所有群体

时间:2019-04-17 12:22:55

标签: pandas pandas-groupby

分组分组聚合存在以下问题,即添加未在数据框中显示但基于所需输出的分组。一个例子:

|---------------------|------------------|------------------|
|      A              |     C            |     H            |
|---------------------|------------------|------------------|
|      16.04.19       |     34           |     53           |
|---------------------|------------------|------------------|
|      17.04.19       |     40           |     23           |

如何获得以下信息?谢谢!

import pandas as pd
from pandas.compat import StringIO

csvdata = StringIO("""day,sale
1,1
2,4
2,10
4,7
5,2.3
7,4.4
2,3.4""") 
#day 3,6 are intentionally not included here but I'd like to have it in output

df = pd.read_csv(csvdata, sep=",")
df1=df.groupby(['day'])['sale'].agg('sum').reset_index().rename(columns={'sale':'dailysale'})

df1

1 个答案:

答案 0 :(得分:1)

您可以在汇总range后添加具有指定sum的{​​{3}}:

df1 = (df.groupby(['day'])['sale']
         .sum()
         .reindex(range(1, 8), fill_value=0)
         .reset_index(name='dailysale'))
print (df1)

   day  dailysale
0    1        1.0
1    2       17.4
2    3        0.0
3    4        7.0
4    5        2.3
5    6        0.0
6    7        4.4

另一个想法是使用ordered categorical,因此聚合sum添加缺少的行:

df['day'] = pd.Categorical(df['day'], categories=range(1, 8), ordered=True)
df1 = df.groupby(['day'])['sale'].sum().reset_index(name='dailysale')
print (df1)
  day  dailysale
0   1        1.0
1   2       17.4
2   3        0.0
3   4        7.0
4   5        2.3
5   6        0.0
6   7        4.4