我有一个如下所示的数据框
Contract_ID Place Contract_Date Price
1 Bangalore 2018-10-25 100
2 Bangalore 2018-08-25 200
3 Bangalore 2019-10-25 300
4 Bangalore 2019-11-25 200
5 Bangalore 2019-10-25 400
6 Chennai 2018-10-25 100
7 Chennai 2018-10-25 200
8 Chennai 2018-10-25 100
9 Chennai 2018-10-25 300
10 Chennai 2019-10-25 400
11 Chennai 2019-10-25 600
从上面我想使用熊猫生成下表。
预期输出:
Place Year Number_of_Contracts Average_Price
Bangalore 2018 2 150
Bangalore 2019 3 300
Chennai 2018 4 175
Chennai 2019 2 500
答案 0 :(得分:4)
将GroupBy.agg
与Series.dt.year
创建的年份和元组一起用于新的列名称:
df['Contract_Date'] = pd.to_datetime(df['Contract_Date'])
df1 = (df.groupby(['Place', df['Contract_Date'].dt.year.rename('Year')])['Price']
.agg([('Number_of_Contracts','size'),('Average_Price','mean')])
.reset_index())
print (df1)
Place Year Number_of_Contracts Average_Price
0 Bangalore 2018 2 150
1 Bangalore 2019 3 300
2 Chennai 2018 4 175
3 Chennai 2019 2 500
解决方案named aggregation,但对于0.25+以上的熊猫是必需的:
df['Contract_Date'] = pd.to_datetime(df['Contract_Date'])
df1 = (df.groupby(['Place', df['Contract_Date'].dt.year.rename('Year')])
.agg(Number_of_Contracts=('Contract_ID','size'),
Average_Price=('Price','mean'))
.reset_index())
print (df1)
Place Year Number_of_Contracts Average_Price
0 Bangalore 2018 2 150
1 Bangalore 2019 3 300
2 Chennai 2018 4 175
3 Chennai 2019 2 500