有没有一种方法可以优化熊猫中的臭虫代码?

时间:2020-10-22 13:37:13

标签: python pandas iteration

这是输入数据

仅在有期初存货价值的情况下,通过添加期初存货,购买数量和已售出数量来计算期末存货价值。

Input data

我想在有条件的情况下为“期初库存”和“平仓库存”添加值。

  1. 当期初库存值为0或空白时,应填写 按先前记录的收盘价
  2. 仅当此记录和先前记录的站点和项目代码相同时,才应填写

对于我来说,是df.iterrows()中的行:

df['Opening Stock']  = np.where((df['Site'] == df['Site'].shift(1)) & (df['Item Code'] == df['Item Code'].shift(1))& ((df['Opening Stock'] == 0) | (df['Opening Stock'].isna())),df['Closing Stock'].shift(1),df['Opening Stock'])
df['Closing Stock'][i] = df['Opening Stock'][i]+df['Purchase Qty'][i]+df['Sold Qty'][i]

这是输出的样子

enter image description here

问题在于数据集的大小很大,需要数小时才能完成。

有没有一种方法可以优化此代码?

2 个答案:

答案 0 :(得分:1)

您可以执行此操作而无需任何迭代方法。第一步是将0中的Opening Stock值转换为np.nan,以便我们在下一步中填充它们。

import pandas as pd
import numpy as np


df = pd.DataFrame({'Site': ['site 1', 'site 1', 'site 2', 'site 2'],
                   'Item Code': ['A', 'A', 'A', 'A'],
                   'Opening Stock': [1000, 0, 2000, 0],
                   'Closing Stock': [1200, 0, 2250, 0],
                   'Purchase Qty': [500, 100, 400, 300],
                   'Sold Qty': [-300, -200, -150, -100]})

df.loc[df['Opening Stock'] == 0, 'Opening Stock'] = np.nan
df['Opening Stock'] = df.groupby(['Site', 'Item Code'])['Opening Stock'].fillna(df['Closing Stock'].shift(1))
df['Closing Stock'] = df['Opening Stock'] + df['Purchase Qty'] + df['Sold Qty']

答案 1 :(得分:0)

将条件应用于Pandas数据框中的每一行的一种方法是df.apply()

df.apply(
         self.my_function
         args = function_args
         ....
)
def my_function(self, row, args):
    row['Closing Stock']= row['Opening Stock']+row['Purchase Qty']+row['Sold Qty']
# you can do pretty much whatever you want inside this function
#note: row is a pandas series
   return row  #it return the row modified and inserts it into the dataframe