我想使用以下函数从平均销售额中计算出每日销售额:
def derive_daily_sales(avg_sales_series, period, first_day_sales):
"""
derive the daily sales from previous_avg_sales start date to current_avg_sales end date
for detail formula, please refer to README.md
@avg_sales_series: an array of avg sales(e.g. 2020-08-04 to 2020-08-06)
@period: the averaging period in days (e.g. 30 days, 90 days)
@first_day_sales: the sales at the first day of previous_avg_sales
"""
x_n1 = avg_sales_series[-1]*period - avg_sales_series[0]*period + first_day_sales
return x_n1
avg_sales_series
应该是熊猫系列。
数据框如下所示:
date, customer_id, avg_30_day_sales
12/08/2020, 1, 30
13/08/2020, 1, 40
14/08/2020, 1, 40
12/08/2020, 2, 20
13/08/2020, 2, 40
14/08/2020, 2, 30
我想首先对customer_id
进行分组,然后对date
进行排序。然后,获得大小为2的滚动窗口。并假设derive_daily_sales
= 30并且period
等于第一个first_day_sales
,应用自定义函数avg_30_day_sales
。
我尝试过:
df_sales_grouped = df_sales.sort_values('date').groupby(['customer_id','date'])]
df_daily_sales['daily_sales'] = df_sales_grouped['avg_30_day_sales'].rolling(2).apply(derive_daily_sales, axis=1, period=30, first_day_sales= df_sales['avg_30_day_sales'][0])
答案 0 :(得分:1)
您不应该按日期分组,因为您要在该列上进行滚动,因此分组应为:
df_sales_grouped = df_sales.sort_values('date').groupby('customer_id')
接下来,您实际要做的是在数据框中的每个组上apply
滚动窗口。因此,您需要两次使用apply
,一次在分组数据帧上,一次在每个滚动窗口上。可以按照以下步骤进行操作:
rolling_arguments = {'period': 30, 'first_day_sales': df_sales['avg_30_day_sales'][0]}
df_sales['daily_sales'] = df_sales_grouped['avg_30_day_sales'].apply(
lambda g: g.rolling(2).apply(derive_daily_sales, kwargs=rolling_arguments))
对于给定的输入数据,结果为:
date customer_id avg_30_day_sales daily_sales
12/08/2020 1 30 NaN
13/08/2020 1 40 330.0
14/08/2020 1 40 30.0
12/08/2020 2 20 NaN
13/08/2020 2 40 630.0
14/08/2020 2 30 -270.0