如何从两个不同的熊猫数据帧计算比率

时间:2018-04-03 04:51:14

标签: python pandas

我试图在几周内计算零售商sku的销售比率,然后计算零售商sku的平均值。

到目前为止,我已经能够计算sku的销售总额,然后我将零售商sku的销售额分组。

现在我无法找到计算零售商sku'N'周销售比率的方法。

这是我的代码

score_period = [
        [201636, 201643],
        [201640, 201647],
        [201645, 201652],
        [201649, 201704],
        [201701, 201708]
    ]


    sku_group = df.groupby('Sku', as_index=False)
    sku_list = sku_group.groups.keys()

    for sku in sku_list:

        df_sku = df[df['Sku'] == sku]
        for period in score_period:
            df_period = df_sku[(df_sku['Week'] >= period[0]) &
                               (df_sku['Week'] <= period[1])]

            # sales of each week in period
            df_sum = df_period.groupby(['Week'], as_index=False)['WeekSales'].sum()
            # retailer sales sum per week
            sums = df_period.groupby(['Week', 'RetailerCode'], as_index=False)['WeekSales'].sum()

            for index, rows in sums.iterrows():
                sums['ratio'] = sums['WeekSales'] / df_sum[(df_sum['Week'])]['WeekSales']

数据

sales = [
    {'RetailerCode': 'RET001', 'Sku': 'SKU001', 'Week': 201636, 'WeekSales': 10},
    {'RetailerCode': 'RET002', 'Sku': 'SKU002', 'Week': 201636, 'WeekSales': 20},
    {'RetailerCode': 'RET003', 'Sku': 'SKU003', 'Week': 201636, 'WeekSales': 0},
    {'RetailerCode': 'RET004', 'Sku': 'SKU004', 'Week': 201636, 'WeekSales': 10},
    {'RetailerCode': 'RET001', 'Sku': 'SKU001', 'Week': 201637, 'WeekSales': 5},
    {'RetailerCode': 'RET002', 'Sku': 'SKU002', 'Week': 201637, 'WeekSales': 10},
    {'RetailerCode': 'RET003', 'Sku': 'SKU003', 'Week': 201637, 'WeekSales': 20},
    {'RetailerCode': 'RET004', 'Sku': 'SKU004', 'Week': 201637, 'WeekSales': 3},
]

df = pd.DataFrame(sales)

预期结果:

RET001 avg ratio = (Ratio of first week + Ratio of second week) / 2
RET002 avg ratio = (Ratio of first week + Ratio of second week) / 2

1 个答案:

答案 0 :(得分:0)

解释

  • 在最后一个for循环中,你应该访问行,而不是总和(整个表格)。

  • 因为您遍历表格,所以不能仅通过sum['ratio']添加列。 您必须使用sums.loc[index, 'ratio'] (可以找到here的解释)

  • 要匹配df_sumsums中的一周,您需要执行df_sum[df_sum['Week'] == rows['Week']这将在WeekSales中返回与当前行中df_sum匹配的Week的值。

请检查以下代码是否符合您的要求。

score_period = [
    [201636, 201643],
    [201640, 201647],
    [201645, 201652],
    [201649, 201704],
    [201701, 201708]
]
sku_group = df.groupby('Sku', as_index=False)
sku_list = sku_group.groups.keys()


sku_group = df.groupby('Sku', as_index=False)
sku_list = sku_group.groups.keys()
#for sku in sku_list:
#  df_sku = df[df['Sku'] == sku]
for period in score_period:
    df_period = df[(df['Week'] >= period[0]) & (df['Week'] <= period[1])]

    # sales of each week in period
    df_sum = df_period.groupby(['Week'], as_index=False)['WeekSales'].sum()
    # retailer sales sum per week
    sums = df_period.groupby(['Week', 'RetailerCode'], as_index=False)['WeekSales'].sum()
    for index, rows in sums.iterrows():
        sums.loc[index,'ratio'] = (rows['WeekSales']/df_sum[df_sum['Week']==rows['Week']]['WeekSales']).values

结果:

     Week RetailerCode  WeekSales     ratio
0  201636       RET001         10  0.250000
1  201636       RET002         20  0.500000
2  201636       RET003          0  0.000000
3  201636       RET004         10  0.250000
4  201637       RET001          5  0.131579
5  201637       RET002         10  0.263158
6  201637       RET003         20  0.526316
7  201637       RET004          3  0.078947