我有一个熊猫数据框“高”
segment sales
Milk 10
Chocolate 30
和另一个数据框“低”为
segment sku sales
Milk m2341 2
Milk m235 3
Chocolate c132 2
Chocolate c241 5
Chocolate c891 3
我想使用低的比率来分解高。所以我在这里得到的数据是
segment sku sales
Milk m2341 4
Milk m235 6
Chocolate c132 6
Chocolate c241 15
Chocolate c891 9
答案 0 :(得分:0)
首先,我会找到使每种产品的销售额成倍增长所需的规模。
df_agg = df_low[["segment", "sales"]].groupby(by=["segment"]).sum().merge(df_high, on="segment")
df_agg["scale"] = df_agg["sales_y"] / df_agg["sales_x"]
然后,应用比例
df_disagg_high = df_low.merge(df_agg[["segment", "scale"]])
df_disagg_high["adjusted_sale"] = df_disagg_high["sales"] * df_disagg_high["scale"]
如果需要,您可以排除额外的列。
答案 1 :(得分:0)
试试:
df_low["sales"] = df_low.sales.mul(
df_low.merge(
df_high.set_index("segment")["sales"].div(
df_low.groupby("segment")["sales"].sum()
),
on="segment",
)["sales_y"]
).astype(int)
print(df_low)
打印:
segment sku sales
0 Milk m2341 4
1 Milk m235 6
2 Chocolate c132 6
3 Chocolate c241 15
4 Chocolate c891 9