Question

我有以下代码，产生一个数据框，向我显示每月和每年的平均售价。我想在此添加每年的总行数和每个pid（人）的总行数。示例代码和数据：

import pandas as pd
import StringIO

s = StringIO.StringIO("""pid,year,month,price
    1,2017,4,2000
    1,2017,4,2900
    1,2018,4,2000
    1,2018,4,2300
    1,2018,5,2000
    1,2018,5,1990
    1,2018,6,2200
    1,2018,6,2400
    1,2018,6,2250
    1,2018,7,2150
    """)

df = pd.read_csv(s)
maths = {'price': 'mean'}
gb = df.groupby(['pid','year','month'])
counts = gb.size().to_frame(name='n')
out = counts.join(gb.agg(maths)).reset_index()
print(out)

哪种产量：

   pid  year  month  n        price
0    1  2017      4  2  2450.000000
1    1  2018      4  2  2150.000000
2    1  2018      5  2  1995.000000
3    1  2018      6  3  2283.333333
4    1  2018      7  1  2150.000000

我希望每年增加的行看起来像：

   pid  year  month  n        price
0    1  2017    all  2  2450.000000
0    1  2018    all  8  2161.000000

然后每个pid汇总如下：

   pid  year  month  n        price
0    1  all     all  10  2218.000000

我在整理/聚集最后两个帧时遇到了麻烦，在这些帧中，我本质上希望为每个all和year值进行month拆分，然后将此处的每个数据帧组合在一起可以写入CSV或数据库表中。

Answer 1

使用pd.concat

df1=df.groupby(['pid','year','month']).price.agg(['size','mean']).reset_index()
df2=df.groupby(['pid','year']).price.agg(['size','mean']).assign(month='all').reset_index()
df3=df.groupby(['pid']).price.agg(['size','mean']).assign(**{'month':'all','year':'all'}).reset_index()
pd.concat([df1,df2,df3])
Out[484]: 
          mean month  pid  size  year
0  2450.000000     4    1     2  2017
1  2150.000000     4    1     2  2018
2  1995.000000     5    1     2  2018
3  2283.333333     6    1     3  2018
4  2150.000000     7    1     1  2018
0  2450.000000   all    1     2  2017
1  2161.250000   all    1     8  2018
0  2219.000000   all    1    10   all

带计数的Groupby +汇总行

1 个答案: