我有以下代码,产生一个数据框,向我显示每月和每年的平均售价。我想在此添加每年的总行数和每个pid
(人)的总行数。示例代码和数据:
import pandas as pd
import StringIO
s = StringIO.StringIO("""pid,year,month,price
1,2017,4,2000
1,2017,4,2900
1,2018,4,2000
1,2018,4,2300
1,2018,5,2000
1,2018,5,1990
1,2018,6,2200
1,2018,6,2400
1,2018,6,2250
1,2018,7,2150
""")
df = pd.read_csv(s)
maths = {'price': 'mean'}
gb = df.groupby(['pid','year','month'])
counts = gb.size().to_frame(name='n')
out = counts.join(gb.agg(maths)).reset_index()
print(out)
哪种产量:
pid year month n price
0 1 2017 4 2 2450.000000
1 1 2018 4 2 2150.000000
2 1 2018 5 2 1995.000000
3 1 2018 6 3 2283.333333
4 1 2018 7 1 2150.000000
我希望每年增加的行看起来像:
pid year month n price
0 1 2017 all 2 2450.000000
0 1 2018 all 8 2161.000000
然后每个pid
汇总如下:
pid year month n price
0 1 all all 10 2218.000000
我在整理/聚集最后两个帧时遇到了麻烦,在这些帧中,我本质上希望为每个all
和year
值进行month
拆分,然后将此处的每个数据帧组合在一起可以写入CSV或数据库表中。
答案 0 :(得分:1)
使用pd.concat
df1=df.groupby(['pid','year','month']).price.agg(['size','mean']).reset_index()
df2=df.groupby(['pid','year']).price.agg(['size','mean']).assign(month='all').reset_index()
df3=df.groupby(['pid']).price.agg(['size','mean']).assign(**{'month':'all','year':'all'}).reset_index()
pd.concat([df1,df2,df3])
Out[484]:
mean month pid size year
0 2450.000000 4 1 2 2017
1 2150.000000 4 1 2 2018
2 1995.000000 5 1 2 2018
3 2283.333333 6 1 3 2018
4 2150.000000 7 1 1 2018
0 2450.000000 all 1 2 2017
1 2161.250000 all 1 8 2018
0 2219.000000 all 1 10 all