我编写了一个函数,该函数迭代3个参数以发现缺货数据点,以说明这一点,我将举一个例子:
假设商店2中名为“ super5”的商品的产品10遵循了这一趋势
day | qty
1 50
2 70
3 55
4 67
5 13
6 0
报价在第6天完成,该产品在第5天潜在缺货。为了发现这一点并进行验证,我找到了0数量的指数并向后看2,我采用了这两天的平均值(67 + 13)/ 2 = 40(如果均值>第一个十分位数,则标记为“脱销”,否则标记为“ ko”
)我尝试过:
def flag_out_of_stock(dataframe, sku, store, offer) :
ffl = []
sku_store_offer = dataframe[["id_sku", "id_store", "id_offer"]].drop_duplicates(["id_sku", "id_store", "id_offer"])
for sku, store, offer in tqdm(zip(sku_store_offer["id_sku"], sku_store_offer["id_store"], sku_store_offer["id_offer"])):
cond1 = dataframe["id_sku"] == sku
cond2 = dataframe["id_store"] == store
cond3 = dataframe["id_offer"] == offer
timeseries = dataframe[np.logical_and.reduce((cond1, cond2, cond3))][["f_qty_recalc", "id_day"]]\
.set_index("id_day")\
.sort_index()
mu = timeseries.mean()[0]
if mu >= 6 :
sigma = timeseries.std()[0]
q1 = timeseries.quantile(0.1)[0]
# index where qty == 0
likely_out_of_stock_index = np.where(timeseries ==0)[0]
# if there more that one value where qty == 0
if len(likely_out_of_stock_index) > 1 :
# for each index where qty == 0
for i in likely_out_of_stock_index :
# if the day before or day after are superior to the first decile
#then flag out of stock
day_before_2 = timeseries.iloc[i-2:i].mean()[0]
if day_before_2 >= q1 :
ffl.append("out_of_stock")
elif day_before_2 >= mu - sigma :
ffl.append("likely_out_of_stock")
else :
ffl.append("KO")
else :
try :
day_before_2 = timeseries.iloc[likely_out_of_stock_index-2:likely_out_of_stock_index].mean()[0]
if day_before_2 >= q1 :
ffl.append("out_of_stock")
elif day_before_2 >= mu - sigma :
ffl.append("out_of_stock")
else :
ffl.append("KO")
except TypeError :
ffl.append("KO")
else :
ffl.append("KO")
return pd.Series(ffl)
该代码在约60万行参数上运行并根据log 0.9sec每次迭代花费约1.3秒的问题是操作
timeseries = dataframe[np.logical_and.reduce((cond1, cond2, cond3))][["f_qty_recalc", "id_day"]]\
.set_index("id_day")\
.sort_index()
所以我想找到另一种方式来做到这一点或加快建议的速度。谢谢。