加快Python功能

时间:2019-05-01 20:36:19

标签: python pandas dataframe

我编写了一个函数,该函数迭代3个参数以发现缺货数据点,以说明这一点,我将举一个例子:

假设商店2中名为“ super5”的商品的产品10遵循了这一趋势

day | qty

1     50
2     70
3     55
4     67
5     13
6     0

报价在第6天完成,该产品在第5天潜在缺货。为了发现这一点并进行验证,我找到了0数量的指数并向后看2,我采用了这两天的平均值(67 + 13)/ 2 = 40(如果均值>第一个十分位数,则标记为“脱销”,否则标记为“ ko”

我尝试过:

def flag_out_of_stock(dataframe, sku, store, offer) : 
ffl = []
sku_store_offer = dataframe[["id_sku", "id_store", "id_offer"]].drop_duplicates(["id_sku", "id_store", "id_offer"])
for sku, store, offer in tqdm(zip(sku_store_offer["id_sku"], sku_store_offer["id_store"], sku_store_offer["id_offer"])):
    cond1 = dataframe["id_sku"] == sku
    cond2 = dataframe["id_store"] == store
    cond3 = dataframe["id_offer"] == offer
    timeseries = dataframe[np.logical_and.reduce((cond1, cond2, cond3))][["f_qty_recalc", "id_day"]]\
    .set_index("id_day")\
    .sort_index()
    mu = timeseries.mean()[0]
    if mu >= 6 : 
        sigma = timeseries.std()[0]
        q1 = timeseries.quantile(0.1)[0]
        # index where qty == 0
        likely_out_of_stock_index = np.where(timeseries ==0)[0]
        # if there more that one value where qty == 0
        if len(likely_out_of_stock_index) > 1 : 
            # for each index where qty == 0
            for i in likely_out_of_stock_index :  
                # if the day before or day after are superior to the first decile 
                #then flag out of stock
                day_before_2 = timeseries.iloc[i-2:i].mean()[0]
                if day_before_2 >= q1 : 
                    ffl.append("out_of_stock")
                elif day_before_2 >= mu - sigma  :
                    ffl.append("likely_out_of_stock")
                else : 
                    ffl.append("KO")
        else :
            try : 
                day_before_2 = timeseries.iloc[likely_out_of_stock_index-2:likely_out_of_stock_index].mean()[0]
                if day_before_2 >= q1 : 
                    ffl.append("out_of_stock")
                elif day_before_2 >= mu - sigma  :
                    ffl.append("out_of_stock")
                else : 
                    ffl.append("KO")
            except TypeError :
                ffl.append("KO")
    else : 
        ffl.append("KO")
return pd.Series(ffl)

该代码在约60万行参数上运行并根据log 0.9sec每次迭代花费约1.3秒的问题是操作

timeseries = dataframe[np.logical_and.reduce((cond1, cond2, cond3))][["f_qty_recalc", "id_day"]]\
.set_index("id_day")\
.sort_index()

所以我想找到另一种方式来做到这一点或加快建议的速度。谢谢。

0 个答案:

没有答案