我有一个如下所示的数据框
Contract_ID Place Contract_Date Price
1 Bangalore 2018-10-25 100
2 Bangalore 2018-08-25 200
3 Bangalore 2019-10-25 300
4 Bangalore 2019-11-25 200
5 Bangalore 2019-10-25 400
6 Chennai 2018-10-25 100
7 Chennai 2018-10-25 200
8 Chennai 2018-10-25 100
9 Chennai 2018-10-25 300
10 Chennai 2019-10-25 400
11 Chennai 2019-10-25 600
从上面我想使用熊猫生成下表。
预期输出:
Place Year Number_of_Contracts Average_Price
Bangalore 2018 2 150
Bangalore 2019 3 300
Chennai 2018 4 175
Chennai 2019 2 500
尝试下面的代码,它工作正常。但是我想将下面的代码转换为函数。任何帮助将不胜感激。
df['Contract_Date'] = pd.to_datetime(df['Contract_Date'])
df1 = (df.groupby(['Place', df['Contract_Date'].dt.year.rename('Year')])
.agg(Number_of_Contracts=('Contract_ID','size'),
Average_Price=('Price','mean'))
.reset_index())
答案 0 :(得分:1)
使用:
def func(df):
df['Contract_Date'] = pd.to_datetime(df['Contract_Date'])
return (df.groupby(['Place', df['Contract_Date'].dt.year.rename('Year')])
.agg(Number_of_Contracts=('Contract_ID','size'),
Average_Price=('Price','mean'))
.reset_index())
然后调用函数:
df1 = func(df)
或使用DataFrame.pipe
:
df1 = df.pipe(func)
编辑:
def func(df, dates, place, id1, price):
df[dates] = pd.to_datetime(df[dates])
return (df.groupby([place, df[dates].dt.year.rename('Year')])
.agg(Number_of_Contracts=(id1,'size'),
Average_Price=(price,'mean'))
.reset_index())
df1 = func(df, 'Contract_Date', 'Place', 'Contract_ID', 'Price')
print (df1)
Place Year Number_of_Contracts Average_Price
0 Bangalore 2018 2 150
1 Bangalore 2019 3 300
2 Chennai 2018 4 175
3 Chennai 2019 2 500