如何将自定义滚动功能应用于pandas groupby?

时间:2020-09-04 08:04:37

标签: python pandas

我想使用以下函数从平均销售额中计算出每日销售额:

def derive_daily_sales(avg_sales_series, period, first_day_sales):
    """
    derive the daily sales from previous_avg_sales start date to current_avg_sales end date
    for detail formula, please refer to README.md

    @avg_sales_series: an array of avg  sales(e.g. 2020-08-04 to 2020-08-06)
    @period: the averaging period in days (e.g. 30 days, 90 days)
    @first_day_sales: the sales at the first day of previous_avg_sales
    """

    x_n1 = avg_sales_series[-1]*period - avg_sales_series[0]*period + first_day_sales

    return x_n1

avg_sales_series应该是熊猫系列。

数据框如下所示:

date, customer_id, avg_30_day_sales
12/08/2020, 1, 30
13/08/2020, 1, 40
14/08/2020, 1, 40
12/08/2020, 2, 20
13/08/2020, 2, 40
14/08/2020, 2, 30

我想首先对customer_id进行分组,然后对date进行排序。然后,获得大小为2的滚动窗口。并假设derive_daily_sales = 30并且period等于第一个first_day_sales,应用自定义函数avg_30_day_sales

我尝试过:

df_sales_grouped = df_sales.sort_values('date').groupby(['customer_id','date'])]

df_daily_sales['daily_sales'] = df_sales_grouped['avg_30_day_sales'].rolling(2).apply(derive_daily_sales, axis=1, period=30, first_day_sales= df_sales['avg_30_day_sales'][0])

1 个答案:

答案 0 :(得分:1)

您不应该按日期分组,因为您要在该列上进行滚动,因此分组应为:

df_sales_grouped = df_sales.sort_values('date').groupby('customer_id')

接下来,您实际要做的是在数据框中的每个组上apply滚动窗口。因此,您需要两次使用apply,一次在分组数据帧上,一次在每个滚动窗口上。可以按照以下步骤进行操作:

rolling_arguments = {'period': 30, 'first_day_sales': df_sales['avg_30_day_sales'][0]}
df_sales['daily_sales'] = df_sales_grouped['avg_30_day_sales'].apply(
    lambda g: g.rolling(2).apply(derive_daily_sales, kwargs=rolling_arguments))

对于给定的输入数据,结果为:

      date  customer_id  avg_30_day_sales  daily_sales
12/08/2020            1                30          NaN
13/08/2020            1                40        330.0
14/08/2020            1                40         30.0
12/08/2020            2                20          NaN
13/08/2020            2                40        630.0
14/08/2020            2                30       -270.0