我有两个输入,一个数据框包含所有产品的现有库存,一个数据框列表,其中每个数据框包含多个产品的订单。订单数据框可以包含同一产品的多个订单,并且这些订单可以大于现有的可用库存。
[ prodID orderQTY
0 ABC1 456
1 ABC2 703
2 ABC3 359
3 ABC4 492
4 ABC5 824
.. ... ...
[100 rows x 2 columns],
prodID orderQTY
0 ABC10 805
1 ABC11 860
2 ABC12 651
3 ABC13 662
4 ABC14 802
.. ... ...
[100 rows x 2 columns],
prodID orderQTY
0 ABC100 111
1 ABC101 834
2 ABC102 302
3 ABC103 386
4 ABC104 318
.. ... ...
[100 rows x 2 columns]]
prodID onHand
0 ABC1 37200
1 ABC2 38945
2 ABC3 38085
3 ABC4 43775
4 ABC5 10999
.. ... ...
[300 rows x 2 columns]
我编写了一个函数来遍历列表中每个数据框内的订单并更新现有库存。然后,该函数将输出包含所有订单的数据框,其中显示了开始的库存,订单金额和结束的库存。
def fulfill_orders(order_list, stock):
# create empty dataframe where the results will go
combined = pd.DataFrame(columns=['prodID', 'startSTOCK', 'orderQTY', 'endSTOCK'])
# loop through dataframes in order_list
for orders in order_list:
# create unique list of product ids
ids = orders['prodID'].unique()
# loop through unique list of product ids
for id in ids:
# create dataframe that consists of a single product id
df = orders.groupby('prodID').get_group(id)
# merge stock on hand onto single product id dataframe
df = df.merge(stock, how='left', on='prodID')
# rename stock on hand column
df.rename(columns={'onHAND': 'startSTOCK'}, inplace=True)
# create ending stock column and calculate value for first row
df.loc[0,'endSTOCK'] = df.loc[0, 'startSTOCK'] - df.loc[0, 'orderQTY']
# check if ending stock for first row is less than zero
if df.loc[0, 'endSTOCK'] < 0:
# reset negative first row ending stock to starting stock
df.loc[0, 'endSTOCK'] = df.loc[0, 'startSTOCK']
# loop through remaining rows
for i in range(1, len(df)):
# set starting stock to previous ending stock
df.loc[i, 'startSTOCK'] = df.loc[i-1, 'endSTOCK']
# calculate new ending stock
df.loc[i, 'endSTOCK'] = df.loc[i, 'startSTOCK'] - df.loc[i, 'orderQTY']
# check if ending stock is less than zero
if df.loc[i, 'endSTOCK'] < 0:
# reset negative ending stock to starting stock
df.loc[i, 'endSTOCK'] = df.loc[i, 'startSTOCK']
# append results to output dataframe
combined = combined.append(df, ignore_index=True)
# remove extra columns
df = df[['prodID', 'endSTOCK']]
# rename columns
df.rename(columns={'endSTOCK': 'onHAND'}, inplace=True)
# keep only last value
df = df.iloc[[-1]]
# update the stock on hand dataframe
stock = pd.concat([stock, df]).drop_duplicates(['prodID'], keep='last')
# change column types to integer
combined['startSTOCK'] = combined.startSTOCK.astype(int)
combined['endSTOCK'] = combined.endSTOCK.astype(int)
# reorder columns
combined = combined[['prodID', 'startSTOCK', 'orderQTY', 'endSTOCK']]
# reset index
combined.reset_index(drop=True, inplace=True)
# drop orders not fulfilled
combined.drop(combined[(combined.startSTOCK <= 0) | (combined.orderQTY > combined.startSTOCK)].index, inplace=True)
# reset index
combined.reset_index(drop=True, inplace=True)
return combined.sort_values(['prodID', 'startSTOCK'], ascending=[True, False])
prodID startSTOCK orderQTY endSTOCK
0 ABC1 37200 456 36744
1 ABC2 38945 703 38242
2 ABC3 38085 359 37726
3 ABC4 43775 492 43283
4 ABC5 10999 824 10175
.. ... ... ... ...
[300 rows x 4 columns]
我担心的是,对于较大的输入数据,此功能可能效率不高。有没有更有效的方法来达到我想要的结果?
答案 0 :(得分:0)
设置
df_order
输出
prodID orderQTY
0 ABC1 456
1 ABC2 703
2 ABC3 359
3 ABC4 492
4 ABC5 824
将DataFrames(dfs
)的列表作为您重复显示三次的第一个DataFrame的第一部分。
dfs = [df_order] * 3
df_stock = df_stock.set_index('prodID')
过程
df_orders = pd.concat(dfs).groupby('prodID').sum()
df_all = df_stock.join(df_orders)
df_all['endSTOCK'] = df_all['onHand'] - df_all['orderQTY']
df_all
返回
onHand orderQTY endSTOCK
prodID
ABC1 37200 1368 35832
ABC2 38945 2109 36836
ABC3 38085 1077 37008
ABC4 43775 1476 42299
ABC5 10999 2472 8527