我试图在几周内计算零售商sku的销售比率,然后计算零售商sku的平均值。
到目前为止,我已经能够计算sku的销售总额,然后我将零售商sku的销售额分组。
现在我无法找到计算零售商sku'N'周销售比率的方法。
这是我的代码
score_period = [
[201636, 201643],
[201640, 201647],
[201645, 201652],
[201649, 201704],
[201701, 201708]
]
sku_group = df.groupby('Sku', as_index=False)
sku_list = sku_group.groups.keys()
for sku in sku_list:
df_sku = df[df['Sku'] == sku]
for period in score_period:
df_period = df_sku[(df_sku['Week'] >= period[0]) &
(df_sku['Week'] <= period[1])]
# sales of each week in period
df_sum = df_period.groupby(['Week'], as_index=False)['WeekSales'].sum()
# retailer sales sum per week
sums = df_period.groupby(['Week', 'RetailerCode'], as_index=False)['WeekSales'].sum()
for index, rows in sums.iterrows():
sums['ratio'] = sums['WeekSales'] / df_sum[(df_sum['Week'])]['WeekSales']
数据
sales = [
{'RetailerCode': 'RET001', 'Sku': 'SKU001', 'Week': 201636, 'WeekSales': 10},
{'RetailerCode': 'RET002', 'Sku': 'SKU002', 'Week': 201636, 'WeekSales': 20},
{'RetailerCode': 'RET003', 'Sku': 'SKU003', 'Week': 201636, 'WeekSales': 0},
{'RetailerCode': 'RET004', 'Sku': 'SKU004', 'Week': 201636, 'WeekSales': 10},
{'RetailerCode': 'RET001', 'Sku': 'SKU001', 'Week': 201637, 'WeekSales': 5},
{'RetailerCode': 'RET002', 'Sku': 'SKU002', 'Week': 201637, 'WeekSales': 10},
{'RetailerCode': 'RET003', 'Sku': 'SKU003', 'Week': 201637, 'WeekSales': 20},
{'RetailerCode': 'RET004', 'Sku': 'SKU004', 'Week': 201637, 'WeekSales': 3},
]
df = pd.DataFrame(sales)
预期结果:
RET001 avg ratio = (Ratio of first week + Ratio of second week) / 2
RET002 avg ratio = (Ratio of first week + Ratio of second week) / 2
答案 0 :(得分:0)
在最后一个for循环中,你应该访问行,而不是总和(整个表格)。
因为您遍历表格,所以不能仅通过sum['ratio']
添加列。 您必须使用sums.loc[index, 'ratio']
(可以找到here的解释)
df_sum
和sums
中的一周,您需要执行df_sum[df_sum['Week'] == rows['Week']
。 这将在WeekSales
中返回与当前行中df_sum
匹配的Week
的值。 请检查以下代码是否符合您的要求。
score_period = [
[201636, 201643],
[201640, 201647],
[201645, 201652],
[201649, 201704],
[201701, 201708]
]
sku_group = df.groupby('Sku', as_index=False)
sku_list = sku_group.groups.keys()
sku_group = df.groupby('Sku', as_index=False)
sku_list = sku_group.groups.keys()
#for sku in sku_list:
# df_sku = df[df['Sku'] == sku]
for period in score_period:
df_period = df[(df['Week'] >= period[0]) & (df['Week'] <= period[1])]
# sales of each week in period
df_sum = df_period.groupby(['Week'], as_index=False)['WeekSales'].sum()
# retailer sales sum per week
sums = df_period.groupby(['Week', 'RetailerCode'], as_index=False)['WeekSales'].sum()
for index, rows in sums.iterrows():
sums.loc[index,'ratio'] = (rows['WeekSales']/df_sum[df_sum['Week']==rows['Week']]['WeekSales']).values
结果:
Week RetailerCode WeekSales ratio
0 201636 RET001 10 0.250000
1 201636 RET002 20 0.500000
2 201636 RET003 0 0.000000
3 201636 RET004 10 0.250000
4 201637 RET001 5 0.131579
5 201637 RET002 10 0.263158
6 201637 RET003 20 0.526316
7 201637 RET004 3 0.078947