分组分组聚合存在以下问题,即添加未在数据框中显示但基于所需输出的分组。一个例子:
|---------------------|------------------|------------------|
| A | C | H |
|---------------------|------------------|------------------|
| 16.04.19 | 34 | 53 |
|---------------------|------------------|------------------|
| 17.04.19 | 40 | 23 |
如何获得以下信息?谢谢!
import pandas as pd
from pandas.compat import StringIO
csvdata = StringIO("""day,sale
1,1
2,4
2,10
4,7
5,2.3
7,4.4
2,3.4""")
#day 3,6 are intentionally not included here but I'd like to have it in output
df = pd.read_csv(csvdata, sep=",")
df1=df.groupby(['day'])['sale'].agg('sum').reset_index().rename(columns={'sale':'dailysale'})
df1
答案 0 :(得分:1)
您可以在汇总range
后添加具有指定sum
的{{3}}:
df1 = (df.groupby(['day'])['sale']
.sum()
.reindex(range(1, 8), fill_value=0)
.reset_index(name='dailysale'))
print (df1)
day dailysale
0 1 1.0
1 2 17.4
2 3 0.0
3 4 7.0
4 5 2.3
5 6 0.0
6 7 4.4
另一个想法是使用ordered categorical
,因此聚合sum
添加缺少的行:
df['day'] = pd.Categorical(df['day'], categories=range(1, 8), ordered=True)
df1 = df.groupby(['day'])['sale'].sum().reset_index(name='dailysale')
print (df1)
day dailysale
0 1 1.0
1 2 17.4
2 3 0.0
3 4 7.0
4 5 2.3
5 6 0.0
6 7 4.4