我目前在包含订单的pandas数据框中循环,这样我就可以从库存中删除订购的商品,并跟踪哪些订单可能无法填写(这是预订系统的一部分)。 我喜欢避免循环并以更加pythonic / panda-esque的方式执行此操作但是却无法想出任何让我达到我喜欢的粒度级别的东西。任何想法将不胜感激!
这是一个非常简化的版本。
输入的示例如下所示:
import pandas as pd
import random
def get_inventory():
df_inv = pd.DataFrame([{'sku': 'A1', 'remaining': 1000},
{'sku': 'A2', 'remaining': 600},
{'sku': 'A3', 'remaining': 180},
{'sku': 'B1', 'remaining': 800},
{'sku': 'B2', 'remaining': 500},
], columns=['sku', 'remaining']).set_index('sku')
df_inv.loc[:, 'allocated'] = 0
df_inv.loc[:, 'reserved'] = 0
df_inv.loc[:, 'missed'] = 0
return df_inv
def get_reservations():
skus = ['A1', 'A2', 'A3', 'B1', 'B2']
res = []
for i in range(0, 1000, 1):
res.append({'order_id': i,
'sku': random.choice(skus),
'number_of_items_reserved': 1})
df_res = pd.DataFrame(res,
columns=['order_id', 'sku', 'number_of_items_reserved'])
return df_res
清单:
df_inv = get_inventory()
print(df_inv)
remaining allocated reserved missed
sku
A1 1000 0 0 0
A2 600 0 0 0
A3 180 0 0 0
B1 800 0 0 0
B2 500 0 0 0
预订:
df_res = get_reservations()
print(df_res.head(10))
order_id sku number_of_items_reserved
0 0 A3 1
1 1 B1 1
2 2 A3 1
3 3 A1 1
4 4 B1 1
5 5 B1 1
6 6 B1 1
7 7 B1 1
8 8 A3 1
9 9 B1 1
为库存分配预订的逻辑看起来大致如下: (这是我想要替换的部分)
"""
df_inv: inventory grouped (indexed) by sku (style and size)
df_res: reservations by order id for a style and size
"""
df_inv = get_inventory()
df_res = get_reservations()
for i, res in df_res.iterrows():
sku = res['sku']
n_items = res['number_of_items_reserved']
inv = df_inv[df_inv.index == sku]['remaining'].values[0]
df_inv.loc[(df_inv.index == sku), 'reserved'] += n_items
if (inv-n_items) >= 0:
df_inv.loc[(df_inv.index == sku), 'allocated'] += n_items
df_inv.loc[(df_inv.index == sku), 'remaining'] -= n_items
else:
df_inv.loc[(df_inv.index == sku), 'missed'] += n_items
结果:
remaining allocated reserved missed
sku
A1 817 183 183 0
A2 390 210 210 0
A3 0 180 210 30
B1 613 187 187 0
B2 290 210 210 0
答案 0 :(得分:0)
由于Pandas中的intrinsic data alignment,你可以在没有循环的情况下获得成功。
df_inv = get_inventory()
df_res = get_reservations()
创建索引为“sku”
的系列n_items = df_res.groupby('sku')['number_of_items_reserved'].sum()
shortage = df_inv['remaining'] - n_items
enough_inv = shortage > 0
因为Pandas进行内部数据对齐并且df_inv索引为'sku'且上面创建的系列索引为'sku',所以这些计算由'sku'完成。使用布尔索引来确定哪个'sku有足够的库存来增加allocated
并减少remaining
或增加missed
。
df_inv['reserved'] += n_items
df_inv.loc[enough_inv,'allocated'] += n_items
df_inv.loc[enough_inv,'remaining'] -= n_items
df_inv.loc[~enough_inv,'missed'] -= shortage
df_inv.loc[~enough_inv,'allocated'] += n_items + shortage
df_inv.loc[~enough_inv,'remaining'] = 0
print(df_inv)
输出:
remaining allocated reserved missed
sku
A1 815.0 185.0 185 0.0
A2 410.0 190.0 190 0.0
A3 0.0 180.0 200 20.0
B1 586.0 214.0 214 0.0
B2 289.0 211.0 211 0.0