避免循环使用pandas数据帧来跟踪剩余库存

时间:2017-06-26 18:06:10

标签: pandas

我目前在包含订单的pandas数据框中循环,这样我就可以从库存中删除订购的商品,并跟踪哪些订单可能无法填写(这是预订系统的一部分)。 我喜欢避免循环并以更加pythonic / panda-esque的方式执行此操作但是却无法想出任何让我达到我喜欢的粒度级别的东西。任何想法将不胜感激!

这是一个非常简化的版本。

输入的示例如下所示:

import pandas as pd
import random

def get_inventory():

    df_inv = pd.DataFrame([{'sku': 'A1', 'remaining': 1000},
         {'sku': 'A2', 'remaining': 600},
         {'sku': 'A3', 'remaining': 180},
         {'sku': 'B1', 'remaining': 800},
         {'sku': 'B2', 'remaining': 500},
         ], columns=['sku', 'remaining']).set_index('sku')

    df_inv.loc[:, 'allocated'] = 0
    df_inv.loc[:, 'reserved'] = 0
    df_inv.loc[:, 'missed'] = 0

    return df_inv


def get_reservations():
    skus = ['A1', 'A2', 'A3', 'B1', 'B2']
    res = []
    for i in range(0, 1000, 1):
        res.append({'order_id': i, 
                    'sku': random.choice(skus),
                    'number_of_items_reserved': 1})    

    df_res = pd.DataFrame(res, 
         columns=['order_id', 'sku', 'number_of_items_reserved'])

    return df_res

清单:

df_inv = get_inventory()
print(df_inv)


     remaining  allocated  reserved  missed
sku                                        
A1        1000          0         0       0
A2         600          0         0       0
A3         180          0         0       0
B1         800          0         0       0
B2         500          0         0       0

预订:

df_res = get_reservations()
print(df_res.head(10))

   order_id sku  number_of_items_reserved
0         0  A3                         1
1         1  B1                         1
2         2  A3                         1
3         3  A1                         1
4         4  B1                         1
5         5  B1                         1
6         6  B1                         1
7         7  B1                         1
8         8  A3                         1
9         9  B1                         1

为库存分配预订的逻辑看起来大致如下: (这是我想要替换的部分)

"""
df_inv: inventory grouped (indexed) by sku (style and size)
df_res: reservations by order id for a style and size
"""
df_inv = get_inventory()
df_res = get_reservations()

for i, res in df_res.iterrows():

    sku = res['sku']
    n_items = res['number_of_items_reserved']

    inv = df_inv[df_inv.index == sku]['remaining'].values[0]

    df_inv.loc[(df_inv.index == sku), 'reserved'] += n_items

    if (inv-n_items) >= 0:
        df_inv.loc[(df_inv.index == sku), 'allocated'] += n_items
        df_inv.loc[(df_inv.index == sku), 'remaining'] -= n_items
    else:
        df_inv.loc[(df_inv.index == sku), 'missed'] += n_items

结果:

     remaining  allocated  reserved  missed
sku                                        
A1         817        183       183       0
A2         390        210       210       0
A3           0        180       210      30
B1         613        187       187       0
B2         290        210       210       0

1 个答案:

答案 0 :(得分:0)

由于Pandas中的intrinsic data alignment,你可以在没有循环的情况下获得成功。

df_inv = get_inventory()
df_res = get_reservations()

创建索引为“sku”

的系列
n_items = df_res.groupby('sku')['number_of_items_reserved'].sum()
shortage = df_inv['remaining'] - n_items 
enough_inv = shortage > 0

因为Pandas进行内部数据对齐并且df_inv索引为'sku'且上面创建的系列索引为'sku',所以这些计算由'sku'完成。使用布尔索引来确定哪个'sku有足够的库存来增加allocated并减少remaining或增加missed

df_inv['reserved'] += n_items
df_inv.loc[enough_inv,'allocated'] += n_items
df_inv.loc[enough_inv,'remaining'] -= n_items
df_inv.loc[~enough_inv,'missed'] -= shortage
df_inv.loc[~enough_inv,'allocated'] += n_items + shortage
df_inv.loc[~enough_inv,'remaining'] = 0

print(df_inv)

输出:

     remaining  allocated  reserved  missed
sku                                        
A1       815.0      185.0       185     0.0
A2       410.0      190.0       190     0.0
A3         0.0      180.0       200    20.0
B1       586.0      214.0       214     0.0
B2       289.0      211.0       211     0.0